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" EXHIBIT A . ■ ■ 

FUNCTION 

BestFit makes an optimal alignment of the best segment of similarity betv^een two sequences. Optimal 

DEscmmoN 

- BestFit inserts pps to obtain the optinial alignment of the best region of similarity between two 
sequences, and then displays the alignment in a format similar to the output from Gap The seoa^c^s 
can be of veiy drfferent lengths and have only a small segment of similarity between them ?ou Suld 
take a short ENA sequence, for esample, and run it against a whole mitochondrial genome. 

SEARCHING FOR SIMILARTIY 

BestFit is the mos* powerfij method in the Wisconsin Sequence Analysis Package™ for identityine the 
best region of smulanty between two sequences whose relationship is unknown. 

EXAMPLE 

The sequence gamma^eq contans an Alu family sequence somewhere in the first 500 bases, alu seq 
contemsa g^enc him^ :^/«™ay repeat. The two sequences are aKgned and the best segment of 
sumlarxty is found with BestPit. b^^i- 

% bestfit 

BESTFIT of what sequence 1 ? gamma. seq 

Begin (* 1 *) ? 
End (* 11375 *) ? 500 
Reverse {* No *) ? 

to what sequence 2 (* gamma. seq *) ? alu.seq 

Begin (* i *) ? 
End (* 207 *) ? 
Reverse (* No *) ? 

What is the gap creation penalty (* 5.00 *) ? 

What is the gap eictension penalty (* 0.30 *) ? . 

What should - call the paired ouHi'put display file (* gamma.pair *) 





. Gaps 


: 3 


Qualizv 




3ual-"y 


: 0.€25 




. 34.455 




ZZs 
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Here is the output file. Notice how. BestKt finds and displays only the best segments of similaxiiy: 

BESTFIX Of: gaitima.seq check: 6474 from: 1 to: 500 

Hiaman fetal beta globins G and A gaimna 

from Shen, Slightom and Smithies, Cell 26; 191-203. 

Analyzed by Smithies et al. Cell 26; 345-353. 

to: alu.seq check: 4238 from: 1 to: "'207 

HSREPa from the EMBL data library 

Human Alu repetitive secpience located near the insulin gene 
Dhruva D.R., Shenk T., Subramanian K.N.; "Integration in vivo into ' 
Simian virus 40 DNA of a sequence that resembles a certain family of 
genomic interspersed repeated sequences"; Proc. Hatl. Acad. Sci tisa 
77:4514-4518(1980). ... 

cSSeckr523r" ^"^"^'^-^^I^^— -^-^-^-^talSwgapdna.Czrp 

Gap Weigbt: 5.000 Average Match: 1.000 

Length Weight: 0.300 Average Mismatch: -0.900 

Quality: 129.3 Length: 209 . 

Ratio: 0.625 Gaps: 3 

Percent Similarity: 84.466 Percent Identity: 84.466 

gaimna. seq x alu.seq June 20, 1994 IS: 15 

137 AGACCAACCTGGCCAACATGGTGaAArCCCATCTCTAC.aAAAATACAAA 18S 
INII.I li:!llllllllllll||| llllliHi, ,11111,11,, 

1 AGACCAGCCTGGCCAACATGGTGAAACTCCATCTCTACTGJIAAAXACAAA 50 

186 AATTAGACAGGCATGArGGO^TGCCTGTAATCCCAGCTAClTGGGAGG 235 

51 AAXTAecC&GGCATGGrGSTGaGTGCCTGS&ATCCCAGCTACTiaSGAGG 100 

236 CTGAGGRAGGAGAATTGCTTGRACCTGGAAGGCAGGAGTTGC&GTGAGCC: 285 

•III',. II .lim__JII_llll.^ I III I 'llllillllllll 
101 CTGA<aCA£.:ulGAATtcfcTTA2UlCaC^G?AG<^SdSGTTGCACT^ 149 

28S GAGATCATACCACTGCACTCCAGCCTGGGTGJSCAGZUiCJiAGAeTCTGTCr 335 

iiNiir-ii-Jiiiiiiiiiiiii iiiimiKi- null ■ill 

150 GAGATa3dACGGCTGCACTCCAGarr!?GGTGACAGA^&U5ACTCCSTCr 198 

336 CAAAAAAA.2^ 344 

i • I ■ • • . 

195 C.\A.%AA.aaA 20" 
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RELATED PROGRAMS 



When you want an alignment that covers the whole length of both sequences, use Gap. T^Tien you are 
trying to find only the best segment of similarity between two sequences, use BestFit. PileUp creates a 
multiple sequence alignment of a group of related sequences, aligning the whole length of all sequences. 
DotPlot displays the entire surface of comparison for a comparison of two sequences. GapShow 
displays the pattern of differences between two aligned sequences. PlotSimilarity plots the average 
similarity of two or more aligned sequences at each position in the alignment. Pretty displays 
alignments of several sequences. lineUp is an editor for editing multiple sequence alignments 
^ CompTable helps generate scoxing matrices for peptide comparison. 



ALGORITHM 



BestFit uses the 5 focoZ AomoZogy algorithm of Smith and Waterman (Advances in AppKed 
Mathematics 2; 482-^ (1981)) to find the best segment of similarity between two sequences, BestFit 
reads a scoring matrix that contains values for every possible GCG sjmibol match (see the LOCAL 
DATA FILES topic below). The program uses these values to construct a path matrix that represents 
the entire surface of comparison with a score at every position for the best possible aligmnent to that 
point. The quality score for the best alignment to any point is equal to the sum of the scoring matrix 
values of the matches in that alignment, less the gap creation penalty times the number of gaps in that 
alignment, less the gap extension penalty times the total length of all gaps in that alignment. The gap 
creation and gap extension penalties are set by you. If the best path to any point has a negative value, 
a zero is put in that position. 

After the path matrix is complete, the highest value on the surface of comparison represents the end of 
the best region of similarity between the sequences. The best path from this highest value backwards 
to the point where the values revert to zero is the alignment shown by BestPit. This alignment is the 
best segment of similarity between the two sequences. 

For nucleic adds, the default scoring matrix has a match value of LO for each identical symbol 
comparison and -0.90 for eaci non-identical comparison (not considering nucleotide ambiguity symbols 
for this example). The quality score for a nucleic add alignment can, therefore, be determined using 
the following equation: 

Quality = 1.0 X TotalMazclies + "0.90 x TotalMismatclies 
~ (GapCreationPenalty x GapNumber) 

- (GapExtensionPehalty x TotalLengtliOfGaps) 

The quality score for a protein alignment is calculated in a similar manner. However, while the default 
nucleic add scoring matrix has a single value for all non-identical comparisons, the default protein 
scoring matrix has different values for the various npn-identical amino add comparisons. The quality 
score for a protein alignment can therefore be determined using the following equation (where Total 
is the total number of A-A (Ala-Ala) matches in the alignment, CiapV^l, is the value for an A-A 
comparison in the scoring matrix, ^Tczal^ is the total number of A-B *(Ala-Asx) matches in the 
alignment. CaapVal^ is the value for an A-B' comparison in the scoring matrix, .,.) : 

Quaiiry » CinpVal^ x Total 
-J- CaiDVal X Tonal 

- CnoVal X Torai 

*C AC 



- CnsDVal X Tozal 

- ? jrapCrea-icrireaalry x GapNumber) 

- (3ap£xrensicr^enalzy x rotalLengxhOf Gaps) 

For a more complete discTission of scoring matrices, see the Data Files ma-niifll 
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CONSIDERATIONS ' - 
BestFS Always Rnds Something 

BestFit always finds an aHgnment for any two sequences you compare - even if there is nn 
signaficant sxnnlanty between &eni! You must evaluate the result? critically to dedd? i the 
segment shown is not just a random region of relative similarity. 

The Segments Shown Obscure Alternofive Segments 

BestPit only shows one segment of similarity/ ko if there are several, all but one is obscured You 
can approach this problem with graphic matrix analysis (see tte Compare iTooJ^ot 
programs). Alternatively you can run BestPit on ranges outside the ranges of simi^ty Wd 
m earher runs to fanng other segments out of the shadow of the best segment ^"^^ 

The Best Fit is Only One IVlennber of a Famiiy 

Like all fast gaping algorithms, the aHgnment displayed is a member of the family of best 
abgnments Thas family may have other members of equal quality, but will ^have ^iv 
member with a higher qua^ty. The family is usually signfficantly different foTdiferent SoSJ 
of gap creation and gap ertension penalties. See the CONSIDERATIONS topic in the en^ for 

fa^SnTofp^nS^t ^ ^ — 

The Surface of Comparison 

^T^iS*""^ proportional to the area of the surface of comparison 

That area is deteimined by the product of the lengths of the two sequences compared BesSS 
can evaluate a surface of up to 3.5 million elements. This surface would be l^Jreno^STto 
compare two sequences appwamately 1.870-symbols long, or one sequence 2O0.;yi^bSnZ 
ZtT^^V^"^"^ 17.500-symbols long. When you have much longer sequenlSlmt arf 
SdTntly. ^""^ '"^ eommand-line option -LlMit to use L surface more 

The Public Scoring IWatrix for Nucleic Acid Comparisons is Very Stringent 

V"°^^ swgapdn^cmp penalizes mismatches -0.9 so the segments found may be very 

bnef This penalty means that the aHgnment cannot be extended l|three b^i to pfcHS 
eictra match. The s«,rmg matrix used by Smith and Waterman, when local aKgi^Sts were &st 
descnbed used -0 333 for Ae mismatch penalty. You can use Fetch to copy rSm&a^p^d 
mojoe It swgapdna.cmp to use these values, or use nwsgapdna.cmp. Jiich has no^^ 

Rapid Alignment 

When possible. Bes,J^t tries to find lie optimal alignment veiy qmckly. If this rapid alignment 
IS not unambiguous^ optimal. BestFit automatically realigns the sequences to calculSTthe 

(?^..^r'°T-'-di!Sfv.?2;T'* r'^-'^^^t progress^n your tenninal screen 
(«^ig::ir.g. . . ' is displayed twice for a single alignment 

ALIGNING LONG SEQUENCES 

ms program can align very long sequences if you know roughly where the alignment of interest 
bepns. Run the program .nth the command Hne option -liMit -^en set the starSJcLinates Sr 
each sequence near the point where the aHgnment of interest begins and set gap shift li^^tTon each 
sequence. The program then aHgns the sequences fi-om your starting point surSiS Ae sequences^^^ 
not get out of phase by more than the gap shift Hmits you have set If you start ed bot^ sequence t 
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incomplete, or limUed, surface of comparison i^ln™ Sin S3 best alignment within this 
whether the alignme^^t could possib^bTrpr^^r^^^^^^ 

gaps m each sequence. If . the program cannot rule out this possibility it S^^W f^i °^ 

opti^^ even if this .essagf is diJpU, Jif ^^^^X '^''i^^^T' »^ 
EVALUATING AUGNMEf^ SIGNIFICANCE 

shuffled, maintainingits length ^a Z^I^^^I:!^^^^ rfi^XT" V^^^*^''^ 

with the -H^ao^SSous commLtCSli^! SHeS^t riO?"^^""^ ^ ^ ^^^'^^^'^ 

score of each randomized aKgmnent is reported to the screen. You can use <ctxl>C to interrunt 
the randonuzataons and output the result.: *w>Tn j • j ^"-"^-^c to mterrupt 

completed. °^ randomized alignments that have been ■ 

^^.fw'i? statisticaJ properties of biological sequences, this simple Monte Carlo statistical 
method may gave nusleadmg results. Plea'?** x^r.^^ n i w t r. TV ^'^^sQcai 

Waterman, M.S. (NucL Acids Kes 12- 215 IS n qr^i^??^^' ^Jbi^. WJ., Smith. T.P., and 
nucleic acid similarities. ' (1984)) for a discussion of the statistical significance of 

ALIGNMENT METRICS 

BestFit and Gap display four figures of merit for aligmnents: Quality. Eatio. Identity, and Similarity. 

S^^rS^^didl^fnSS^i^^^^^ - sequences. Ratio is the 

SSs that acLjly ^ P^St S^l^°^' ^T"^^ Perc^t Identity is the percent of the^ 
Symbols that are acroLfhim^ps^^lfrS^^^^ percent of the ^ols that are similar, 

a^r of symbols is greaS^'^ oreqS to 0 50 S'^S?''-^^^^^ 

by the display procedure to deSe whe^to^ii l^?^*' threshold is also used 

it from the c^mandlSe wi4 Se^eSid ^f^T ^^"'^ "^^^ ^^^^^ 

-P^^X.O^O.Swouldslt'Tj^r^tewS^^^^ ^-^--.theeicpression 

Z^::l^r7Si^e^s^ ^^^^^'^ ^ aZ^^n. pn,^.^ tHey sHould not he used 

PEPTIDE SEQUENCES 

■ ^-TT^d^^^^Sirn^^^^^^^^^^ scored 
measured bv Davhoff and ^orn^efbv Lt^^ amino acids as 

6745-6763 a986/;. * °o™aJized by Gnbskov (Gribskov and Burgess Nucl. Acids. Res. 14(16);- 
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-RESTKICnOMS - . - ... . . 

2,300 . 2,300 comparison. See the AlSSb toNG SEQ^^'SpS S/hefet' S^^" f * 
sequences that would normally exceed the marimmn suAce of »mpari^on You ^ «tf™,F '""^ 
^^^^e, to increase the snrfece of comparison if J^^m h^entiT^ 

SEQUENCETYPE • 

The function of BestFit depends on whether your innut spmi^nrprc^ s.™ r,.,.„+»;« „ 

IJe type of a sequence is determined by the pCeTi^f e^to ^^^^^ 
Appendix VI for information on how to change or set the type of a sequence. 
COMMAND-UNE SUfUSb/iARY 

All parameters for this program may be put on the command line. Use the option -checI- to see th« 
Sr^^Sfo^.^Tc^pi^^eSet^th^ 

Minimal Syntax: % bestfit [-INf ilel«] gamma. seq [-INf ile2=] alu. seq -Default 
Proicpted Parameters: 

-BEGinl=l -BEGin2=l beginning of each sequence 

-ENDl=5O0 -END2=207 end of each sequence 

-NOREVl -NOREV2 strand of each secjuence 

lSJ^hSnH:-fi . ^^"^ creation penalty (3.0 is protein default) 

r^^^T^^T . e^ension penalty (0.1 is protein default 

[-OUTfzlel=]gainma.pair output file for ali^at ctefault) 

Local Data Files: -DATa=swgapdaa.'cmD scnr-i-nr, ^^^^a^ < , . 

T^,™ scoring matrix for nucleic acids 

-DATa=swgappep.cnp scoring matrix for peptides 
Optional Parameters: 

XfU:3:f^n;r-= ^^^^-^ ^^i- ^« -^I^ence I with gaps added 

-^"^ ^'^^^^^^ of coit?.arison^ 

-RANdoxuizatror.sC-13] determine average score -from 10 randomized 

aligmnen-cs 

-S^^SsS'^-''^-' thresholds for displaying 'r, ^ ..^ , and r . ' 

W Dth=50 the numoer of sequence symbols per line 

Z'tzlZl * ^'ith a form feed every 60 lines 



:«?;.f !r^r abbreviation of large'gaps ^Ith"' . ' s 

-"^w-Zr ^ aligrjnent for your oarameters 

-N^cn2ia~. r "^'^ ^""^^^-^ alignment for your parameters 

NCiJMaary suppresses the screen smnmarv 
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Gap and BestRt were originally written for Version 1.0 by Paul Haeberli from a carefiil reading affh^^ 
Needleman . and . Wunsch (J. Mol. Biol. 4S; 443-453 (1976)) and., .the.. .Smith . and WairmT-n 
(Adv. AppL Math. 2; 482-489 (1981)) papers. » .VYaterman 

Limited aHgnments were designed by Paul HaeberU and added to the Package for Version 3 0 Thev 
were united into a single program by Philip Delaquess for Version 4.0. Default gap penalties fer 
protein alignments were modified according to the suggestions of Rechid, Vingron and Afgos (CAEIOs"' 
S'f 107-113 (1989)). 

LOCAL DATA FILES 

-nie files described below supply amdHaiy data to this program. The program automatically reads 
them from a public data directory unless you either 1) have a data file with exactly the same name in 
your current working directory; or 2) name a file on the command line with an expression hke 
-Da3!al==nQrf ile. dat. For more information see C3iapter 4, Using Data Piles in the Usear's Guide. 

If the first sequence you name is a nucleic add, BestKt uses the scoring matrix: in the public file 
swgapdna.cmp. (SW stands for Smith and Waterman.) If the first sequence you name is a peptide 
sequence. BestPit reads swgappep.cmp instead. The presence of these files in your current working 
dh-ectoiy causes BestPit to read your version instead. (See the Data Files manual for more 
mfbrmation about scoring matrices.) 



OPnONAL PARAMETESS 



The parameters and switches Ksted below can be set from the command line. For more information, 
see "Usmg Program Parameters" in (Chapter 3, Basic Concepts: Using Programs in the User's Guide. 

-LIMitl=20 and -I.IMit2=20 

let you set ^op shift limits for each sequence. When you already know of a long similarity 
between two sequences you can "zip" them together using this mode. The beginning coordinates 
for each sequaice must be near the beginning of the aKgnment you want to see. The aUgnment 
continues so that gaps mserted do not require the sequences to get out of step by more than the 
pp sWfl limits. You can align veiy long sequences Tspidiy. The surface of coinparison is still' 
limited to S.o million. The size of a comparison can be predicted by multiplving the average 
length of the two sequences 1:^ the sum of the two shift limits. 

If you add -UWit to the command line without any qualifier value, the program prompts you to 
enter gap shift umits for eadi sequence. 

-RaNdainizarioiis=10 

reports the average aHgnment score 'and standard deviation from 10 randomized alignments in 
which the second sequence is repeatedly shuffled, maintaining the lengtih and composition of the 
original sequence, and then aligned to the first sequence. You can use the optional parameter to 
set the number of randomized alignment to some number other t^^ri iQ. 

-OTCi^le2=secnainel . gap -OOTf ile3=set5iaine2 .gap 

This oroeram can write three different output files. The first disnlays the alignment of sequence 
one «.-:tr. sequence two. The second is a new sequence file for sequence one, nossibly esqianded bv 
gaps to make it align with sequence two. The third, like the second, is a new sequence file for 
sequence two, possibly expanded by gaps to make it align with sequence one. The program 
writes only ^e first file unless there are output file options on the command line. If there are 
any output files named on the command line, only those output files are written. If you add 
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-OUT to fee cotomand line withont any qualifying fflenanie, then tLe" prbgraia v^dll- imti-thV." 
second and third output files after prompting you for their names. 

Aligned sequences (in sequence files) can be displayed with GapShow. Their similarity can be 
displayed with PlotSimilarity. 



-pair=1.0,0.5,0.1 



The paired output file, fi-om this program displays sequence shnilarily by printing one of three 
characters between similar sequence symbols: a pipe character( I ), a colon (:), or a neriod f ^ 
Normally a pipe character is put between sj^bols that are the same, a colon is put between 
symbols whose comparison value is greater than or equal to 0.50, and a period is put between 
symbols whose comparison value is greater than or equal to 0.10. You can change these match 
display thresholds fi-om the command line. The three parameters for -EiJlr are the display 
thresholds for the pipe character, colon, and period. The match display criterion for a nine 
character changes from symbolic identity (the default) to the quantitative threshold you have set 
m the first parameter. A pipe character wiU no longer be inserted between identical symbols 
unless their comparison vahies are greater than or equal to this thresholi If you stai want a 
pipe diaiacter to connect identical symbols, use x instead of a number as the first parameter 
(See the Data Files inanual for more infonnation about scoring matrices.) 



-E2JSe=64 



When you prmt the output fi^m this program, it may cross from one page to another m a 
frostratmg way - espeaally when you print on individual sheets. This option adds form feeds to 
the output file m order to try to keep clusters of related information together. You can set iiie 
number of lines per page by supplying a number after the -paGe qualifier. 



-WlDth=50 



puts 50 sequOTce symbols on eadh line of the output file. You can set the width to anything fit)m 
10 to 150 symbols. 



-NOBIGGaps 



suppresses large gap ablweviations, showing all the sequence characters across fi-om laige gaps 
Usually, gaps that esstend one sequence by more than one complete line of output are abbreviated 
with three dots arranged in a vertical line. 



-LOWroad and -HXGhroad 



The mserfaon of gaps is, m many cases, arbitrary, and equally optimal alignments can be 
generated by msertmg gaps differently. When equally optimal alignments are possible, this 
program cai msert the gaps differently if you select either the -LOWroad or the -HlGhroad 
options. Here are examples for the "aHgriment of GACCAT witii GACAT with different 



parameters. 



^or: Match = 1.0 MisMatch = -0.9 

•Sac weight - 1.0 Leagrh Weight = 0 .0 



: ! ! 

1 3A.CAT 5 
HizrJ^cad: 1 GACCAT 6 



Quali-v = 4.0 



• ! I II Qualitv =4.0 

1 GAC.AI 5 



.Gocnparison " 



For: • JMatc±i 
Gap weight 



1.0 
3.0 



HighRoad: 1 GACCAT 6 
I I I 

a GACAT. 5 



jbOisMatch 
Length Weight 



Quality « 3.0 
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0,0 
0,0 



LowRoad: 



1 GACCAT 6 

.III 
1 .GACAT 5 



Quality « 3.0 



Essentially the low road shifts all of the nrKif^oi-w ^.^^ ^« . 

arfritraiy gaps in sequence one ^i^l^X^^^ZS^'^ ^ ^ of the 

high road nor low road is selected. pSLm^^iri • Z ^''^^ ?^ ^^^^ "^^^^^ 

uses the hi^ road alternative for^ «]S2S^ ""'^"^ ^ ^^^^^"^ ^ 



SnWEcaaxy 



„^ ^ to include a of the p,oB,=„.-s w= A in the log ffle for a progrexa nm in 



Printed: Jxdy 13, 1995 08:19 (1162) 
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Characterization of changes in gene expression associated with malignant 
transformation by the NF-kB family member, v-Rel 

Oleksi Petrenko, Irene Ischenko and Paula J Enrietto 

Department of Microbiology, State University of New York at Stony Brook, Stony Brook, New York, 1 1794, USA 



In this study, alterations in gene expression patterns have 
been examined in v-Rel-transformed avian bone marrow 
cells. Using a conditional v-Rel estrogen receptor 
chimera (v-Rel£R) which transforms cells in an 
estrogen-dependent manner, we constructed subtraction 
cDNA libraries from v-RelER-transformed bone marrow 
cells. Several different sequences were identified whose 
expression was altered upon hormone activation of v- 
RelER. These include two genes related to the MIP-1 
chemokine family {mip-ip and a tca3 homologue), a cell 
surface antigen sca-2 and the transcription factor nfkbL 
The expression of each gene was assayed in a number of 
wild-type and mutant v-Rel-expressing fibroblast and 
hematopoietic cells. All v-Rel-transformed hematopoietic 
cells tested express high levels of njkbl and sca-2. In 
fibroblasts, wild-type v-Rel induced expression of mip-ip 
and nfkbl^ while nontransforming mutants of v-Rel failed 
to do so, suggesting a role for these two genes in v-Rel 
mediated transformation. Finally, these genes are 
expressed at high levels in cells overexpressing wild-type 
and truncated forms of. c-Rel, implying that v-Rel 
transforms, in part, by induction of c-Rel target genes. 

Keywords: v-Rel; NF-kappa B; oncogene; transforma- 
tion 



Introduction 

The Rel/NF-/cB transcription factor family includes 
proteins structurally related through an amino terminal 
region, the Rel Homology Domain. This region is 
largely responsible for several properties of the 
proteins, including homo- and heterodimer formation, 
DNA binding, and interaction with a family of 
inhibitor proteins. All members of the Rel/NF-fcB 
family have been implicated in the regulation of 
transcription by virtue of their interaction with the 
kB enhancer, a potent cis-regulatory sequence present 
in many inducible cellular and viral genes. Active NF- 
fcB transcription complexes are homo- and hetero- 
dimers consisting of one or two members of the Rel/ 
NF-kB family, which includes NF-fcBl, NF-k:B2, Rel A, 
RelB and c-Rel (GriUi et al, 1993; SiebenHst et al, 
1994). The activity of NF-kB complexes is regulated by 
a second family of proteins, the IfcB family (Beg and 
Baldwin, 1993; Verma et al, 1995). 

The target genes for Rel/NF-fcB regulation are 
numerous, and are, for the most part, involved in 
cellular growth and immunoregulatory processes (Grilli 
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et al, 1993; Baeuerle and Baltimore, 1996). Alterations 
in several members of the family have been associated 
with hematopoietic maUgnancies. Thus, the genes 
encoding c-Rel, Rel A, NF-kB 1, NF-/cB2 and Bcl-3 
are located at sites of genomic rearrangements in 
certain human cancers (Lu et al, 1991; Neri et al., 
1991; Liptay et aL, 1992; Ohno et aL, 1993). Targeted 
disruption of c-rel, relb, and nfkbl leads to multiple 
functional defects in the immune system (Kontgen et 
a/., 1995; Sha et aL, 1995; Weih et al., 1995), while the 
disruption of rela and ikba results in embryonic or 
neonatal lethality (Beg et aL, 1995; Klement et al., 
1996). 

Given the role that Rel/NF-/cB proteins play in 
normal and oncogenic processes, it is critical to 
understand the mechanism by which altered Rel 
expression generates the leukemic phenotype. v-Rel, a 
mutated homologue of c-Rel, remains the most 
carefully studied Rel/NF-fcB family member with 
respect to its oncogenic potential. Isolated as the 
oncogene within the avian retrovirus, REV-T, v-Rel 
induces a rapidly fatal hematopoietic malignancy in 
birds and transforms fibroblasts, splenic cultures and 
bone marrow hematopoietic progenitor cells in vitro 
(Bose, 1992). Previous studies revealed that v-Rel 
forms protein complexes with other RcI/NF-kB 
proteins and binds to NF-kB motifs in vitro (Gilmore 
et aL, 1996). While early works suggested that v-Rel 
acts as a dominant negative mutant of c-Rel (Gilmore, 
1990; Bose, 1992), recent data indicate that transforma- 
tion by v-Rel results from its capacity to positively or 
negatively alter expression of genes important for 
hematopoietic cell growth and differentiation 
(Baeuerle and Baltimore, 1996; Gilmore et aL, 1996). 

To more fully understand the mechanism of v-Rel 
transformation, we investigated the transcriptional 
changes induced in hematopoietic cells transformed 
by a conditional form of v-Rel. Fusion of v-Rel to the 
hormone-binding domain of human estrogen receptor 
(v-RelER, Boehmelt et aL, 1992) resulted in the 
creation of a chimeric protein whose biological and 
biochemical properties were inducible and indistin- 
guishable from wild-type v-Rel. Utilizing subtractive 
cDNA libraries derived from v-RelER-transformed 
hematopoietic cells grown in the presence or absence 
of estrogen, we identified several different sequences 
whose expression was altered upon hormone activation 
of v-RelER. These include two genes related to the 
MIP-1 chemokine family (mip-Ip and a tcaS homo- 
logue), a cell surface antigen sca~2, and the transcrip- 
tion factor nfkbL The expression of each of these genes 
was assayed in a number of wild-type and mutant v- 
Rel-expressing fibroblast and hematopoietic cells. All 
v-Rel-transformed hematopoietic cells tested expressed 
high levels of nfkbl and sca-2. In fibroblasts, wild-type 
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v-Rel induced expression of mip-ip and nfkbl, while 
nontransforming mutants of v-Rel failed to do so, 
suggesting a role for these two genes in v-Rel-mediated 
transformation. Finally, these genes are expressed at 
high levels in cells overexpressing wild-type and 
truncated forms of c-Rel, implying that v-Rel trans- 
forms, in part, by induction of c-Rel target genes. 



Results 

Analysis of subtraction cDNA libraries 

To understand the changes in gene expression that 
accompany transformation by v-Rel, we sought to 
identify genes whose levels change upon hormone 
activation of v-RelER. This was accomplished by the 
construction of subtractive cDNA libraries from 
estrogen-induced v-RelER cells and from the cells 
withdrawn from estrogen (see Materials and methods). 
Hybridization and sequence analysis of the clones 
isolated from the subtraction libraries permitted the 
identification of twelve different cDNAs. The deter- 
mined sequences of clones 59 (iKBa), 256 (NF-kB1) 
and 393 (CAP-23) were identical to the corresponding 
sequences of chicken cDNAs. The resolved sequence of 
clone 220 exhibited 80% homology with the 
Drosophila actin-related protein. Other identified 
genes revealed high level homology with the corre- 
sponding mammalian sequences. Table 1 summarizes 
the clones identified and the size of the corresponding 
mRNA transcripts. 

One group of genes, present at 3-10-fold higher 
levels in estrogen-stimulated v-RelER cells, included 
the two putative cytokines, Macrophage Inflammatory 
protein IjS (MIP-lj3, clone 4), and a homologue of 
TCA 3 (hereafter referred to as cTCA, chicken T cell 
activation protein, clone 391); clone 44, omitine 
decarboxylase antizyme (ODC-Az), a key regulator 
of omitine decarboxylase which is constitutively 
activated in various transformed cells (Auvinen et 
aL, 1992); clone 59, IfcBa; clone 71, a member of the 
STAT family of signal transducers and activators of 
transcription, Stat I; clone 80, a chicken homologue of 
the mammalian Stem Cell Antigen-2 (Sca-2); clone 
214, a regulatory subunit of protein phosphatase 2 A 



Table 1 cDNA clones isolated from the subtraction libraries 
Expression in 



Clone 
number 


v-RelER cells 
(^Er) (-Er) 


Size mRNA 
(kb) 


Sequence 
homology 


4 


+ + 




1.0 


MIP-ip (mouse) 


44 


-»- + 




1.6 


ODC-Az (rat) 


59 


-f- + 




3.0 


IxBa (chicken) 


71 


+ + + 




>4,0 


Statl (human) 


80 


+ + + 




1.3 


Sca-2 (mouse) 


214 


+ + 




2.4/4.0 


PP2A (rabbit) 


220 


+ + 




2.0 


ARP (Drosophila) 


256 


+ + + 




3.9 


NF-xBl (chicken) 


391 


+ + + 




0.8 


TCA3 (mouse) 


393 


+ -f + 




1.1 


CAP-23 (chicken) 


31 




+ + 


1.8/4.0 


eIF-2a (human) 


513 


+ 




2.6 


NAPl (mouse) 



Expression levels were classified into four groups as follows: + + + , 
0.2-0.4% of total mRNA, strong expression; + +, 0.04-0.1%, 
moderate expression; +, 0.02% and less, weak expression; no 
detectable expression 



(PP2A); clone 220, a homologue of the maternally 
loaded Drosophila embryo actin-related protein (ARP) 
that plays a role in early embryogenesis (Frankel et 
ai, 1994); NF-kB1, the RcI/NF-kB family member; 
CAP-23, a cortical cytoskeleton-associated protein 
found in developing neural tissues (Widmer and 
Caroni, 1990). 

Of the clones that were reproducibly more abundant 
in the v-RelER cells withdrawn from estrogen, we were 
able to identify two different sequences. These genes 
encode translation initiation factor-2a (eIF-2a), which 
promotes binding of initiator tRNAs to 40S ribosomal 
subunits (Ernst et al., 1987), and nucleosome assembly 
protein-1 (NAP-1) involved in transcription factor 
binding and nucleosome displacement (Walter et al, 
1995). 

Expression of differentially regulated genes in v-RelER 
cells 

The differential expression of identified genes in v- 
RelER cells was confirmed by Northern blot analysis 
of mRNAs prepared from cells grown in the presence 
or absence of estrogen. Because nfkbl was isolated in 
this screen as a potential v-Rel-regulated gene, the 
expression of other RcI/NF-kB family members was 
also examined, including c-re/, re/a, relb and nfkb2. In 
addition, we examined the expression of a gene which 
decreases during hematopoietic cell differentiation, c- 
myb (Graf, 1992). One class of genes, including ctca, 
mip-ip, statl, nfkbl and nfkb2, was expressed in v- 
RelER cells grown in the presence of estrogen, and 
significantly downregulated within 1 day of estrogen 
withdrawal (see Figure 1). The expression of a second 
class of genes, typified by cap-23, sca-l, ikba and c- 
rel, decreased less dramatically upon estrogen with- 
drawal, suggesting that additional factors are involved 
in their regulation. Two other Rel family members, 
rela and relb, were expressed below the level of 
detection in v-RelER cells (data not shown). The 
expression of three genes, eif-2(x, nap-l, and c-myb, 
increased upon withdrawal of v-RelER cells from 
estrogen. c-Myb, a sequence-specific DNA-binding 
protein, is thought to regulate genes whose expres- 
sion is incompatible with cellular differentiation as 
reflected in its expression pattern and biological 
properties (Graf, 1992). Surprisingly, we found that 
c-myb mRNA increased upon estrogen withdrawal, 
conditions under which v-RelER ceUs begin to 
differentiate into dendritic and neutrophilic cells 
(Boehmelt et aL, 1995). 

Transcriptional induction of rcl-regulated ^enes 

To further investigate the possibility that genes isolated 
in this screen are under transcriptional control by v- 
Rel, the v-RelER cells maintained without estrogen for 
3 days were estrogen-induced for various periods of 
time. Subsequently, expression of the corresponding 
mRNAs was analysed. While not dividing, v-RelER 
bone marrow cells withdrawn from estrogen were 
metabolically active, as judged by their ability to 
revert phenotypically upon the readdition of estrogen 
(Boehmelt et aL, 1992) and by expression of P-actin, 
GAPDH, vimentin, MHC class II mRNAs at levels 
equivalent to hormone induced cells (data not shown). 
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Figure 1 Expression of differentially regulated genes in v-RelER bone marrow cells grown in the presence of estrogen (Er + ). or 
without estrogen for I day (Er-). Genomic (G) RCASv-RelER and spliced (S) w-relER mRNAs are indicated 



As expected from the results described above, mip-ip, 
ctca and nfkbl expression was restored within 4 h of 
estrogen addition (see Figure 2). Similar results were 
obtained with njkbl. These data are consistent with the 
time course of activation observed in our previous 
work on the localization and DNA binding properties 
of estrogen-stimulated v-RelER (Boehmelt et al, 1992). 
A more complicated picture was observed for the 
induction of sca-2 mRNA. Although expression of sea- 
2 was dependent on estrogen activated v-RelER, its 
levels were less responsive to estrogen activation, 
demonstrating a delayed kinetics of activation. Other 
examined genes, including arp, cap-23, pp2A and statl, 
were not rapidly induced by hormone, indicating that 
they were not directly regulated by v-RelER (data not 
shown). 

Expression patterns of vd-regulated genes 

To exclude the possibility that the expression of the 
isolated genes reflected a general transformed state, 
not v-Rel transformation, the expresision patterns were 
analysed in hematopoietic cells transformed by 
various oncogenes. In addition to genes rapidly 
induced by hormone, ikba and three other genes, c- 
re/, c-myb and stat-1, encoding transcription factors 
whose expression was altered in v-RelER cells were 
included in this analysis (see Table 2). Two genes 
downregulated in estrogen activated v-RelER cells, 
eif-2a and nap-2, were found at similar levels in most 
lines tested, including v-Rel-transformed NPB4 cells. 
Mip-ip, c-rel and c-myb, were expressed at elevated 
levels in v-Ski-transformed cells. Other examined 
genes, including ctca, njkbl, nfkb2, statl, sca-2 and 
ikba, were found at higher levels in NPB4 cells, 
pointing to their potential role in v-Rel-mediated 
transformation. No evidence for lineage-specific 
expression of these genes could be observed. Instead, 
v-Ski-transformed precursors for the erythroid and 
myeloid lineages displayed elevated levels of several of 
the same genes as v-Rel-transformed Ivmohoid cells 



(see Table 2), which may be due to the elevated levels 
of c-Rel in v-Ski cells. 

Previous studies established that the target cell for v- 
Rel transformation depends on the virus complex used 
for infection. It has been reported that v-Rel trans- 
forms mature IgM-positive B lymphocytes, an IgM- 
negative cell within the T cell or myeloid lineages, or a 
cell expressing markers of the myeloid and B cell 
lineages (Barth and Humphries, 1988; Barth et al, 
1990; Morrison et ai, 1991). To provide evidence for a 
correlation between expression of these genes and v- 
Rel transformation, we assayed gene expression in a 
variety of v-Rel-transformed cells derived from 
different hematopoietic lineages. As can be seen from 
Table 3, which summarizes this analysis, some 
variations in the expression levels were found, 
particularly with mip-ip and ctca. Thus, ctca was 
expressed in 8 out of 10 cell lines at variable levels, 
while mip-ip was found in most lines at low levels. 
Two other genes, c-myb and nfkb2, were expressed at 
low levels in most tested cells (data not shown). In 
contrast, nflcbl, stati, ikba and sca-2 were found at 
elevated levels in almost all cells tested, suggesting that 
overexpression of these genes is characteristic of v-Rel- 
induced phenotype. 

These experiments also allowed us to compare the 
ability of v-Rel and overexpressed or truncated c-Rel 
to induce altered gene expression. Two hematopoietic 
cell lines, B-1 and 189/5, were examined which 
contain both wild-type and carboxy terminal trun- 
cated forms of c-Rel (Hrdlickova et al,, 1994). Each 
cell line expresses high levels of all genes tested (see 
Table 3). Because B-1 cells contain high levels of 
both wild-type and truncated form of c-Rel, it is not 
clear if transcriptional activation and transformation 
in these cells result from wild-type or mutant c-Rel 
activity. In contrast, 189/5 cells contain low levels of 
wild-type c-Rel and high levels of carboxy terminal 
truncated form of c-Rel, suggesting that altered gene 
expression results from the truncated version of c- 
Rel. 
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Figure 2 Time course of gene induction by estrogen-activated v- 
RelER. The v-RelER-transformed bone marrow cells grown in 
the presence of estrogen ( + ) were withdrawn from hormone for 3 
days (0). The cells were subsequently estrogen induced for the 
indicated periods of time over the period of 4 h and cellular 
RNAs were probed with the corresponding cDNAs indicated on 
the right 



Induction of gene expression by carboxy terminal 
mutants of v-rel 

Examination of the biological properties of v-Rel 
mutants has shown that transformation required 
sequences within the amino terminal Rel Homology 
Domain. Small carboxy terminal deletions have a 
marginal effect on the transforming properties of the 
protein (Sarkar and Gilmore, 1993; Smardova et al, 
1995). In our previous studies, deletions approximately 
100 bp in length were made throughout v-re/ and 
mutant proteins were analysed in CEF and BM cells 
(see Materials and methods). All deletion mutants 
within the re/-homology domain {dll-dlS) were 
transformation-defective, as none of them were able 
to bind to DNA. However, mutants which lie outside 
the re/-homology domain dill- 17) retained the 

ability to bind to DNA, activate transcription of 
cellular genes, and transform fibroblasts and BM cells 
to different degrees (Morrison et al, 1992; Smardova et 
aL, 1995). Taking advantage of these mutants, we 
correlated gene expression with the transforming ability 
of \-rel. Bone marrow from 4-10-day old chicks was 
infected with wild-type v-rel or carboxy terminal v-rel 
deletion mutants {dll2J3J5-17) and subsequently 
maintained in liquid culture. All cultures expanded to 
approximately 10* ceils within several weeks. Analysis 
of gene expression showed that mip-ip, nfkbl and stall 
were expressed in all cells tested, though some 
variations in expression levels were observed (see 
Figure 3). In contrast, most v-rel mutations resulted 
in low levels of ctca and sca-2 expression. In our 
previous work, decreased levels of two other Rel- 
regulated genes, MHC class I and HMG 14b, were 
observed in bone marrow cells infected with the 
carboxy terminal v-rel deletion mutants. Though it is 
possible that each deletion mutant transformed a 
different cell type, the phenotypic analysis of these 
cells suggested that this is not the case (Smardova et 
ai, 1995). 

Expression of vel-regulated genes in CEFs 

Wild-type v-Rel and estrogen-activated v-RelER 
confer characteristic growth properties and morphol- 
ogy on primary CEFs (Morrison et al., 1 99 1 ; 
Boehmelt et aL, 1992). Because nontransforming 
mutants of v-rel do not promote growth of bone 



Table 2 Expression of the identified clones in transformed hematopoietic cell lines 

v-ski BM 





NPB4 


DT95 


HP5Q 


(myeloid, 


BM2 


HDU 


HD3 




(lymphoid) 


(lymphoid) 


(lymphoid) 


erythroid) 


(myeloid) 


( myeloid) 


(erythroid) 


Clone number 


y-rel 


ND* 


ND* 


v-ski 


\-myb 


v-myc 


w-erbA/ts-w-erbB 


MIP-ip 


+ 






+ + 




+ 




cTCA 


+ + + 














NF-kBI 


+ + 


+ 


+ 




+ 


+ 




NF-KB2 
















Sca-2 


+ + 


+ 




+ 






+ 


iKBa 






+ 




+ 


+ 


+ 


c-Rel 


+ 


+ 




+ + 






+ 


Statl 


+ 4- 






+ + 








c-Myb 








+ + 








eIF-2a 


+ 


NA 






+ 


. + 


NA 


NAPl 


+ -f- 


NA 






+ 




NA 



ND*, lymphoid cell lines derived from ALV induced tumors; NA, not analysed. For explanation of expression levels see footnotes to Table 1 
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Table 3 Expression of genes in v-Rel transformed cells 



K^eil line 


Vitus 


Tiwup mid IhiP/iQp 

derivation 


V'Rel 
i 


c-Rel 


NFkBI 

7 

5 


A 


Stat I 

c 

J 


0 


cTCA 

7 
/ 


Sca-2 

o 
o 


RCAS-1 


RCASv-Rel 


BM, myeloid 






+ + 


+ + 




+ 




+ + + 


RCAS-2 


RCASv-Rel 


BM, myeloid 


+ + 




+ 


+ + 


+ + + 






+ -f- 


NPB4 


REV-T/REV-A 


BM, lymphoid 


+ + 




+ + 


+ + 


-f + 


+ 


t + + 




BMl 


REV-T/REV-A 


BM, lymphoid 


+ + 




+ + 








+ + 




lu-1 


REV-T/REV-A 


embryo liver, nd 


+ + + 




+ -t- 




+ + 




4- -1- 


+ + 


tu-2 


REV-T/REV-A 


embryo liver, nd 


+ + + 




-f -H 


+ + 






+ + 


+ + 


SS-1 


REV-T/CSV 


spleen, B-Iymphoid 


+ + + 




+ -1- 




+ + -1- 


+ 




+ -1- 


123/6T 


REV-TW/REV-A 


spleen, nonB/nonT 


-H + + 


+ 


+ + 


+ + 








+ + 


123/6 


REV-TW/CSV 


spleen, nonB/nonT 


+ + + 




+ -1- 


+ + 


+ + 




+ 


+ + 


160/2 


REV-TW/CSV 


spleen, T-Iymphoid 


+ + 


+ 


+ + 


+ -f- 


+ + 


+ 


+ 


+ + 


189/5 


REV-C/CSV 


spleen, T-lymphoid 




+ -1- + 


+ + 


+ + 


+ + + 


+ + 


+ + -f- 




B-1 


REV-C/CSV 


spleen, T-lymphoid 




-1- + + 


+ + + 


+ + 


+ + + 


+ + 


+ + 4- 


+ + + 



BM, bone marrow; nd, not defined. For explanation of expression levels see footnotes to Table 1 
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Figure 3 Northern blot analysis of RNA isolated from bone 
marrow cells infected with y-rel carboxy terminal deletion mutants 
dU2 (lane 1), dlJS (lane 2). dU5 (lane 3), dU6 (lane 4), dlJ7 flane 
5), or wild-type v-rel (lane 6). An equivalent filter was probed 
with GAPDH to control for equal loading 
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cytochrome C 
oxidase 



Transforming activity 

+ +- - + + + 

v-rel dfl dl2dl4df6dl8dl12dn5dl16 




G 
S 



v-Rel 



Figure 4 Analysis of mip-lp, sca-2 and nficbJ expression in 
primary CEFs. (a) Induction of gene expression in CEFs 
transfected with RCAS (lane 1), RCASv-Rel (lane 2), RCASv- 
RelER (cells grown in the absence of estrogen, lane 3), RCASv- 
RelER (cells grown in the presence of estrogen, lane 4), or 
RCASc-Rel (lane 5). A filter containing each of the RNAs was 
probed with the mitochondrial cytochrome oxidase cDNA to 
control for equal loading, (b) Induction of gene expression in 
CEFs infected with wild-type v-re/ or \-rel deletion mutants, 
Geneomic (G) RCASv-Rel and spliced (S) v-re/ mRNAs are 
indicated 



marrow cells, we examined gene expression in avian 
fibroblasts infected with mutant forms of v-re/. 
Initially, the inducibility of mip-ip, sca-l and nfkbl 
was tested in v-RelER fibroblasts maintained with or 
without estrogen. As can be seen in Figure 4, all three 
mRNAs were expressed in v-RelER cells grown in the 
presence of estrogen. Fibroblasts overexpressing v-Rel 
and wild-type c-Rel were also analysed and found to 
express each of the tested genes (Fig^ire 4a). In 



contrast, c/ca, nfkb2 and statl were not found in 
fibroblasts presumably because hematopoietic cell- 
specific factors are required for their expression 
(data not shown). 

Next, gene expression was examined in fibroblasts 
expressing transforming (y-rel, dl-I, dl-H, dl-15 and dU 
16) or nontransforming mutants of v-rei {dl'2, dl-4, dl-6 
and dl-8). As can be seen in Figure 4b, the expression 
of mip-ip and nfkbl was evident only in cells 
oroducine wild-type v-reL dl-l mutant which lacks 
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amino terminal viral env sequences, or carboxy 
terminal transforming \-rel mutants {dlI2'dlI6). Non- 
transforming mutants within the Rel Homology 
Domain {dl2-dl8) did not induce expression of these 
genes. Interestingly, while the expression of sca-2 was 
inducible in fibroblasts, it was only found in wild-type 
v-re/-transformed cells. All v-rel mutants failed to 
induce expression of this gene, reflecting the complex- 
ity of the protein structure and interactions required 
for full function of v-Rel. 

Characterization of viral-transduced fibroblasts 

To further investigate the role of MIP-1^ and NF- 
kBI in cellular growth control, the effect of over- 
expression of these genes was studied in chicken 
fibroblasts. The corresponding cDNAs were sub- 
cloned into the retroviral vector pCRNCM, down- 
stream of the CMV promoter. These constructs, 
together with c-rel and c-kit cDNAs used as 
controls, were introduced into chicken embryo 
fibroblasts. The recombinant viruses were subse- 
quently produced by infecting the cells with the 
helper transformation-defective virus tdB77. CEFs 
infected with the recombinant viruses were selected 
in Geneticin and characterized on the RNA or 
protein levels. To coexpress the corresponding rel 
proteins, the cells were superinfected with either 
RCASv-Rel or RCASc-Rel (see Figure 5). 

Analysis of the growth phenotypes revealed that the 
fibroblasts overexpressing nfkbl maintained the growth 
properties characteristic of control pCRNCM-trans- 
duced CEFs. These cells reproducibly grew for 20-25 
passages as did control cells; after that they became 
vacuolated and died (see Table 4). In contrast, 
overexpression of mip-ip extended life the span of 
CEFs. In addition, the mip-ip fibroblasts efficiently 
proliferated in low serum and displayed the ability to 
anchorage-independent growth in soft agar character- 
istic of transformed cells. Coexpression of v-rel in these 
cells further promoted their ability to form colonies in 
soft agar. 

As reported earlier, overexpression of c-rel morpho- 
logically transforms CEFs (Abbadie et aL, 1993; 
Kralova et aL, 1994). Although c-rel fibroblasts do 



not form colonies in soft agar, they do display 
characteristic disruption of cellular cytoskeleton and a 
remarkable extension of life span. Interestingly, CEFs 
overexpressing both c-rel and mip-ip retained the 
major growth properties of the c-rel fibroblasts. In 
addition, they efficiently formed colonies in soft agar, 
revealing a fully transformed phenotype (see Table 4). 
In sharp contrast, coexpression of c-rel and \-rel in 
fibroblasts resulted in decreased cell growth, consistent 
with our previous study which showed that v-Rel and 
c-Rel interfere with the DNA binding properties of 
each other (Hodgson and Enrietto, 1995). 



Discussion 

Most evidence available to date indicates that 
transformation by v-Rel results from its capacity to 
form homodimers, enter the nucleus, and bind DNA. 
While early studies suggested that v-Rel could repress 
transcription, presumably by interfering with c-Rel 



Table 4 Growth properties of viral-transduced fibroblasts 
Growth rate Colony 



Transfected 




(h/cell 


formation 


Life span 


cDNAf 


Vims 


division) 


in soft agar 


(passages)'* 


pCRNCM 


tdB77 


48 




20-25 


c-Rel 


tdB77 


56 




30-35 


NF-kBI 


tdB77 


46 




20-25 


MIP-ip 


tdB77 


42 


8% 


35-40 


c-Kit 


tdB77 


38 


3% 


nd 


RCASv-Rel* 


RCASv-Rel 


39 




50-60 


c-Rel 


RCASv-Rel 


68 




10-15 


NF-kBI 


RCASv-Rel 


46 




25-30 


MIP-lp 


RCASv-Rel 


42 


15% 


40-45 


c-Kit 


RCASv-Rel 


42 


6% 


nd 


RCASc-Rel* 


RCASc-Rel 


50 




40-50 


NF-kBI 


RCASc-Rel 


44 




25-30 


MIP-ip 


RCASc-Rel 


42 


20% 


40-45 


c-Kit 


RCASc-Rel 


39 


5% 


nd 



^AU cDNAs were subcloned into the pCRNCM retroviral vector, 
except where indicated (*). To coexpress the corresponding rel 
proteins, transfected CEFs were superinfected with either RCASv-Rel 
or RCASc-Rel. **Cells that grew for 20-25 passages were in culture 
approximately 2 months. Cells that grew for 50-60 passages were in 
culture for over 6 months 
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Figure 5 Analysis of gene expression in viral-transduced fibroblasts, (a) Northern blot analysis of RNA isolated from normal CEFs 
(lanes I and 3), NF-kBI CEFs (lane 2), and MIP-I)S CEFs (lane 4). Lanes 1 and 2 were hybridized with the nffcbJ probe; lanes 3 
and 4 were probed with mip-lp. (b) Immunoblot analysis of Rel expression in RCASv-Rel CEFs (lane 1); NF-kBI CEFs and MIP- 
1)5 CEFs superinfected with RCASv-Rel (lanes 3 and 4, respectively); RCASc-Rel CEFs (lane 5); NF-kBI CEFs and MIP-1/? CEFs 
superinfected with RCASc-Rel (lanes 6 and 7, respectively); c-Rel CEFs (lane 8); c-Rel CEFs superinfected with RCASv-Rel (lane 
9), Expression in nontransfected CEFs (lane 2) is shown as a control 



activity (Gilmore, 1990; Bose, 1992), more recent 
works point to a role for v-Rel in the transcriptional 
activation of a number of endogenous genes important 
for hematopoietic cell growth and differentiation 
(Baeuerle and Baltimore, 1996; Gilmore et al., 1996). 

These studies and our previous work promoted us to 
develop a more complete picture of the genes whose 
expression might contribute to the v-Rel-induced 
leukemic phenotype. Taking advantage of the condi- 
tional Rel variant, subtraction cDNA libraries were 
constructed from v-RelER-transformed bone marrow 
cells grown in the presence or absence of estrogen. 
Examination of the expression patterns of the isolated 
genes suggested that several of them might be directly 
induced by v-Rel upon transformation. 

Two Rel/NF-fcB family members were upregulated 
in estrogen-stimulated v-RelER cells, njkbl and nfkb2. 
Activation of these two genes in lymphoid cells 
expressing either v-Rel or c-Rel was reported earlier 
(Hrdlickova et aL, 1995). Recent evidence suggests that 
heterodimers of NF-K:B2/p52 with v-Rel can play a role 
in v-Rel-mediated transformation. Thus, heterodimers 
containing p52 and nontransforming mutants of v-Rel, 
that cannot form homodimers, can transform chicken 
spleen cells (White et al, 1996). In addition, 
coexpression of aberrant versions of p52 increases the 
oncogenicity of c-Rel proteins with carboxy terminal 
deletions (Gilmore et al., 1996). While our results 
demonstrate that the nfkb2 expression levels remained 
low in most examined cell lines, the expression of nflcbl 
was high in all examined v-Rel-transformed hemato- 
poietic cells and correlated with v-Rel transformation 
in mutant v-re/-infected cells. A number of studies 
indicate that NF-fcBl/pl05 plays a role of a 
cytoplasmic retention molecule for the RcI/NF-kB 
proteins (Beg and Baldwin, 1993; Grilli et ai, 1993). 
On the other hand, the processing of pi 05 may lead to 
different amounts of p50 homo- and heterodimers in 
the cell. While in v-Rel-transgenic mice, NF-fcBl is not 
required for transformation (Carasco et al., 1996), 
previous studies have associated elevated levels of p50 
with several lymphoid and non-lymphoid malignancies 
(Mukhopadhyay et al., 1995; Bargou et al, 1996). NF- 
kB1/p50 may contribute to oncogenesis by forming 
heterodimeric complexes with v-Rel, c-Rel, and p52, 
which function as potent transcription regulators 
(Siebenlist et al, 1994; Baldwin, 1996). The cells 
expressing constitutively active NF-kB, such as B cells 
or HTLV-I-transformed T cells, contain primarily c- 
Rel/p50 heterodimers in their nuclei, while in 
transformed cells, v-Rel is predominantly complexed 
with p50 and I^Ba (Liou et al, 1994; Miyamoto et al, 
1994; Gilmore et al, 1996). The prevalence of 
constitutively active v-Rel/p50 in these cells may 
explain the upregulation of NF-kB target genes such 
as cytokines and their receptors, MHC, and cell 
adhesion molecules. Thus, constitutive expression of 
nfkbl induced by v-Rel may result in 'constitutive' 
activation of the transformed ceil. 

Two isolated cDNAs, clones 4 and 391, revealed 
sequence homology with the family of chemoattractant 
cytokines. Clone 4 is a chicken homologue of the 
mammalian macrophage inflammatory protein- 1 beta, 
while clone 391 is the homologue of the mouse T cell 
activation protein 3, a cytokine structurally related to 
MlP-liS and IL-8 fBurd et aL 1987), Examination of 
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the expression patterns of both cytokines revealed that 
most v-Rel-transformed hematopoietic cells, regardless 
of lineage, express mip-ip. Its expression also 
correlated with transformation when w-rel deletion 
mutants were examined in bone marrow cells and 
fibroblasts. In contrast, ctca was not expressed in v- 
Rel-transformed fibroblasts. In addition, its expression 
failed to correlate with transformation in bone marrow 
cells, suggesting that it is not required for the 
maintenance of v-Rel-mediated transformation. In 
summary, our data argue for a role for mip-ip in 
transformation by v-Rel and a direct role for v-Rel in 
its transcriptional regulation. 

Previous studies showed that MIP-1^ is a potent 
chemoattractant for monocytes and specific subpopu- 
lations of lymphocytes (Taub et al, 1995; Lloyd et 
al, 1996). Specifically, MIP-1)3 stimulates T cell 
proliferation and induces actin polymerization and 
profound cytoskeletal changes in T cells within 
seconds of exposure (Adams et al, 1994). It also 
exhibits growth regulatory properties for hematopoie- 
tic stem and progenitor cells, costimulating myelopoi- 
esis and antagonizing the growth inhibitory activity 
of MlP-la (Graham et al, 1990). Our data indicate 
that overexpression of mip-1^ in avian fibroblasts 
induces cellular transformation as measured by 
growth in soft agar and extended life span of the 
cells. However, in most examined v-Rel-transformed 
hematopoietic cell lines, mip-ip was expressed at low 
to moderate levels. Therefore, its relevance to the 
transforming process of v-Rel is yet to be demon- 
strated. 

The role of the chicken Sca-2 homologue in 
transformation remains unclear, in part because its 
biological and biochemical properties are not well 
defined. Sca-l expression failed to correlate with v-Rel 
transformation when v-rel deletion mutants were 
examined in fibroblast and hematopoietic cells. In 
addition, its Rel-inducibility was not clear in v-RelER 
cells. However, we found that infection of chicken 
lymphoid or erythroid cell lines with RC AS v-Rel 
induces the expression of sca-2 (data not shown). 
These results imply that the regulation of sca-2 
expression is complex and may involve factors other 
than v-Rel. Mammalian Sca-2 is normally present on a 
subset of immature thymic blasts and bone marrow 
cells that repopulate the thymus (Sprangrude et al, 
1989). Recent data suggest a role for Sca-2 in T cell 
activation and protection from TCR-mediated apopto- 
sis (Saitoh et al, 1995; Noda et al, 1996), Because sca- 
2 was found at high levels in all v-Rel-transformed 
hematopoietic cells we cannot exclude the possibility 
that it plays a role in the generation of v-Rel-induced 
phenotype. 

Several v-Rel target genes have been described so 
far, including HMG-14b and MHC class I in bone 
marrow cells (Boehmelt et al, 1992; Walker and 
Enrietto, 1996); NF-kB1 and iKBa in fibroblasts and 
lymphoid lines (Hrdlickova et al, 1995; Schatzle et 
al, 1995); IL-2R, DM-GRASP, p75, MHC class I 
and II, identified through introduction of y-rel into 
Awyc-transformed lymphoid cell lines (Hrdlickova et 
al, 1994; Zhang et al, 1995; Zhang and Humphries, 
1996). In contrast, MHC class I and I^Ba were not 
upregulated in tumor cells from v-Rel-transgenic mice 
fCarrasco et al., 1996). Our previous work showed 
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that MHC class II was not regulated by v-Rel in 
transformed bone marrow cells (Walker and Enrietto, 
1996). While the subset of Rel-regulated genes may 
vary in different cell types, the data presented here 
indicate that v-Rel acts by stimulating the expression 
of a number of genes, some of which are regulated by 
c-Rel. These alterations may be the result of the direct 
action of v-Rel in regulation of genes such as mip-ip, 
inappropriate expression of which may potentiate 
growth of the transformed cell. Alternatively, v-Rel 
may act indirectly, altering the expression of genes 
through the upregulation of other transcription 
factors, such as NF-zcBl. Most likely, development 
of the v-Rel-induced transformed phenotype is the 
sum of all these changes, each of which may have 
profound biological consequences on the target cell. 
For this reason, we are currently assessing the 
contribution of each of these genes to the trans- 
formed phenotype. 

In the course of this work, we also examined the 
effect of overexpression of wild-type c-Rel on the 
genes identified in this screen. Mip-ip, sca-2 and 
nfkbl were all upregulated in c-Rel-overexpressing 
CEF, suggesting that genes normally controlled by c- 
Rel are targets for v-Rel regulation. Since both v-Rel 
and c-Rel overexpression result in morphological 
transformation of CEF, perhaps expression of this 
set of the genes is required to mediate fibroblast 
transformation. Two c-Rel-transformed spleen cell 
lines were also examined in this study and found to 
have elevated levels of nfkbl, statl, mip-lfi and 

ctca. Because these cell lines also contain truncated 
forms of c-Rel (Hrdlickova et al., 1994), it is not clear 
if transcriptional activation and transformation in 
both of them result from wild-type or mutant c-Rel 
activity. We previously reported that overexpression 
of wild-type c-Rel in bone marrow cells, which appear 
to be granulocytic in origin (unpublished observation), 
leads to cell death (Abbadie et ai, 1993). For this 
reason, it will be important to determine if the genes 
identified in this study are upregulated in the 
granulocytic bone marrow target prior to the onset 
of cell death. 



Materials and methods 

Cells and tissue culture 

CEFs expressing different viruses were prepared using the 
calcium-phosphate/DNA transfection method (Chen and 
Okayama, 1987). Generation of v-Rel and v-RelER 
transformed bone marrow (BM) cells was previously 
described (Morrison et al., 1991; Boehmelt et al., 1992). 
The origin of other cell lines is as follows: BM2, an 
immature myeloid cell line transformed by v-Myb 
(Moscovici et al., 1977); HDll, a myeloid cell line 
transformed by v-Myc (Leutz et al., 1984); HP50 and 
DT95, lymphoid cell lines derived from ALV induced 
tumors (Kabrun et al., 1990; Hrdlickova et al., 1994); v-Ski 
transformed precursors for the erythroid and myeloid 
lineages (Larsen et aL, 1993); HD3 erythroblasts trans- 
formed by v-ErbA/ts-v-ErbB (Schmidt et al., 1986); 189/5 
and B-1, T-lymphoid lines transformed by c-Rel (Hrdlick- 
ova et ai, 1994); BMl, tu-1, tu-2 (P Enrietto, unpub- 
lished), NPB4 (Beug et aL, 1981), SS-1, 123/6, 123/6T, and 
160/2 (Hrdlickova et aL, 1994) are v-Rel transformed cell 
lines. 



v-rel deletion mutants 

Generation and characterization of \-rel deletion mutants 
have been described previously (Morrison et ai, 1992; 
Smardova et al., 1995). Briefly, deletions approximately 
100 bp in length were made by oligonucleotide-directed 
mutagenesis throughout v-reL Each mutant was cloned 
into the RCAS vector for analysis in CEFs and BM cells. 
All deletion mutants between nucleotides 37 and 798 in v- 
rel {dl2-dl8 in this study) were transformation defective. 
Mutants that lie outside this region retained the ability to 
transform fibroblasts and BM cells to different degrees {dll, 
dH2~dll7). 

Characterization of growth properties 

The cDNAs encoding mip-ip, nfkbl, c-rel or c-kit, were 
subcloned into the pCRNCM retroviral vector (Metz et 
ai., 1991), downstream of the CMV promoter. The 
retroviral constructs were transfected into CEFs and 
selected in Geneticin. Recombinant viruses were produced 
by infecting the cells with tdB77 transformation-defective 
helper virus. The viruses were harvested after 4 days of 
cultivation. CEFs infected with the recombinant viruses 
were selected in Geneticin and characterized for the 
corresponding RNA or protein levels. Proliferation of 
fibroblasts was measured in DMEM supplemented with 
1% bovine serum and 0.2% chicken serum by direct 
counting of trypsinized cells using Coulter counter. 
Proliferation related to tumorigenesis was examined in 
soft agar medium composed of DMEM, 6% fetal calf 
serum, 2% chicken serum, 1 x MEM vitamin solution, 
penicillin (100 u/ml), streptomycin (100 fig/m\), and 
nystatin (100 u/ml). Trypsinized cells were mixed with a 
0.25% agar-medium solution to a concentration of 2 x 10* 
cells/ml and 3 ml of each cell mixture was added to 
60 mm Petri dishes prepared with an initial 3 ml underlay 
of 0.7% agar. Cells were grown for 14-16 days in a 
humidified incubator at 37°C under 5% CO,. Proliferation 
was evaluated by the microscopic counting of individual 
colonies (average of four experiments). 

Construction and screening of cDNA libraries 

The poly(A)''RNA prepared from v-RelER-transformed 
BM cells was copied into cDNAs and cloned into the 
EcoRl or Notl site of lambda ZAPII vector (Stratagene). 
Two subtraction cDNA libraries were prepared: one 
enriched for v-RelER-induced sequences (library I), and 
the other enriched for the sequences downregulated by v- 
RelER (library II). In vitro excision of single-stranded 
Bluescript phagemids containing cDNA inserts, biotinyla- 
tion and subtractions were performed as described 
(Schweinfest et al., 1990). Following subtractive hybridiza- 
tion, the phagemids were transfected into E. coii XL 1 -Blue 
cells (Stratagene). Approximately 600-700 bacterial 
colonies were obtained as a result of each subtraction. 
Bacterial clones were tested by differential Northern 
hybridization with cellular RNAs and subjected to 
sequence analysis. DNA sequencing of double-stranded 
plasmid templates was performed using Sequenase kit 
(USB). At least 400 bp of each clone were sequenced to 
determine homologies. Nucleotide sequences were analysed 
using Blast database search programs (NCBI). 

Northern blot hybridization 

RNAs prepared using acid guanidinium thiocyanate- 
phenol extraction procedure (Chomczynski and Sacchi, 
1987) were separated on formaldehyde-agarose gels and 
blotted onto the Hybond-N nylon membranes (Amer- 
sham). Hybridization probes were derived by per 
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amplification of the cloned full-length cDNAs encoding 
MIP-1^, cTCA, Sca-2, CAP-23 and PP2A; partial cDNAs 
for ARP, eIF-2a, NAP-1, ODC-Az and Stat-1, including 
the corresponding coding and 3'-untranslated regions; a 
2.5 kb long NF-kBI cDNA (clone 256) that lacks the 5' 
reZ-homology region. Other hybridization probes included 
y/-rel in the plasmid pBSrelCS (Morrison et aL, 1991); c- 
rel in the plasmid pBSc-Rel (Abaddie et aL, 1993); c-myb 
in the plasmid pneoCCC (kindly provided by Dr J 
Lipsick); MHC class I cDNA clone B-Fl2aF10 (Boeh- 
melt et al„ 1992); MHC class II cDNA clone BLBbll 
(kindly provided by Dr Ch Auffray); NF-zcBl, NF-k;B2 
and RelB cDNAs were kindly provided by Dr TD 
Gilmore; IxBa cDNA was kindly provided by Dr IM 
Verma. 

Representative Northern blots were quantified using 
Ultroscan laser densitometer (LKB). The levels were 
normalized to the density of hybridization of the serial 
dilutions of the corresponding cDNA plasmids run in 
parallel with the RNA samples. Expression levels of c-Rel, 
v-Rel, NF-kB1 and Stat-1 in transformed hematopoietic cell 
lines were verified by immunoblot analysis using the 
corresponding antibodies. 



Western blot analysis 

Protein extracts for Western blot analysis were prepared as 
described (Morrison et ai, 1991). The antibodies used were 
SB66 Rel-specific polyclonal antibody (Boehmelt et aL, 
1992), a polyclonal anti-avian NF-kB1 antibody (kindly 
provided by Dr HR Bose), and anti-Statl mAb 
(Transduction Laboratories), 



Accession numbers 

The sequences described have the following GenBank 
accession numbers: clone 4, L34553; clone 80, L34554; 
clone 391, L34552. 
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PROTEIN STRUCTURE AND flNTIGENICITY 
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ABSTRACT 

The antigenicity of a protein resides in a series of mutually overlapping surface patches 
known as epitopes which make contact with the combining sites of antibody molecules. Epitopes 
are usually localized by demonstrating the presence of antigenic crosSTreactivlty between a 
protein and some peptide fragments- Since structural features of proteins such as the accessibi- 
lity» hydrophilicity and mobility of segments of the polypeptide chain have been correlated 
with the location of epitopes, it is possible to predict from the primary structure which 
linear peptides are likely to correspond to epStopes of the protein. 

The antigenicity of a protein refers to Its capacity to bind specifically to the functional 
binding sites or paratopes of certain immunoglobulin molecules. Paratopes are made up of six 
highly accessible loops of hypervariable sequence that interact to a greater or lesser extent* 
with the surface of the antigen. That portion of the antigen that comes into contact with the 
paratope of the antibody constitutes an antigenic determinant or epitope of the antigen. In the 
same way that the antibody nature of an ininunpglobulin is identified only after its compleraen- 
tary antigen has been recognized, the epitope nature of a cluster of amino acids in a protein 
can be established only by using an iramunoglobulin as a detecting device. An epitope is thus a 
relational entity which can be defined only in a functional and operational sense through the 
availability of complementary paratopes. The question to be addressed here is : to what extent 
do structural features of a protein correlate with the location of its epitopes ? 

It has been customary to divide epitopes into a number of conceptual categories such as 
sequential and conformational epitqpes that are not easily distinguished experimentally. At 
the present time, it is cominon to distinguish between continuous and discontinuous epitopes 
(Benjamin et al., 1984 ; Berzofsky^ 1985 ; Van Regenmortel, 1986). Continuous , epitopes consist- 
of amino acid residues in direct peptide linkage while discontinuous epitopes are defined as a', 
cluster of. residues that are not contiguous in sequence but are juxtaposed at the protein 
surface by the folding of the polypeptide chain. 

A coninon approach for identifying epitopes in a protein consists in measuring the ability 
of natural or synthetic fragments of the molecule to react with antibodies raised against the 
complete nolecule- Any linear peptide of 5-10 residues that is found to react is labelled a 
continuous epitope. However, such a label should not be taken to imply that the linear fragment 
accurately mimics the complete structure of the corresponding epitope in the native protein. 
In most cases, such a linear peptide is likely to represent only part of a larger discontinuous 
epitope of the protein ; antibodies directed to a discontinuous epitope may indeed react, 
albeit weakly, with subregions of the epitope made up of a few residues in linear sequence. 
Although most of our knowledge of protein antigenicity is based on the identification of 
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continuous epitopes, it should be emphasized that they represent only incomplete and adulte- 
rated versions of the original epitopes existing in the native protein. It is widely believed 
that the majority of epitopes are of the discontinuous type (Benjamin at al., 1984 ; Berzofsky. 
198S ■ Barlow et al.. 1986), although until now only a single epitope of this type has been 
fully delineated in lysozyine (Amit et al., 1986). In this case. 16 residues of lysozytne 
(segments 18-27 and 116-129 of the sequence) were found by X-ray crystallography of antigen- 
antibody complexes to be in contact with 17 residues of the paratope belonging to all six 
complementarity-determining regions. The two complementary surfaces showed extensive interpene- 
tration and mutual interdigitation. 

In general, information on discontinuous epitopes Is extremely patchy, since only two or 
three amino acids can usually be shown to be contact residues of the epitope ; this information 
is obtained by showing that related proteins presenting substitutions at these positions are dis- 
criminated- by a particular monoclonal antibody (Benjamin et al.. 1984 ; Van Regenmonel , 1984). 

Many practical applications of lnraunological research are based on the exploitatibn of 
antigenic cross-reactions, made possible for instance by raising antibodies against synthetic 
peptides able to cross-react with the complete protein molecule (Lamer, 1954 ; Walter, 1986). 

The most systematic way to look for continuous epitopes in proteins is to synthesize all 
possible overlapping hexa-. hepta- or octapeptides of the protein, and then to measure their 
capacity to bind to antiprotein antibodies. This is readily achieved by a method developed by 
Geysen et al. (X984) in which the peptides are synthesized on a linear polymer of polyacrylic 
add and tested inwunologically while still bound to the solid phase. The contribution, to the 
antibody-binding interaction, of individual amino acids within a synthetic peptide can then be 
examined by systematically substituting each residue by the other 19 possible amino-acids 
(Rodda et al., 1986). In this way. it can be shown that 'a certain number of residues of conti- 
nuous epitopes are essential for antigenicity (i.e. they cannot be replaced by any other amino 
acid) while other residues can be replaced by all common amino acids without affecting peptide 
binding by the antibody (Getzoff et al.. 1987). When this method was applied to myoglobin 
(Rodda et al.. 1986). a new epitope undetected In earlier work (Atassi. 1934) was Identified in 
residues 48-55. However, several myoglobin epitopes identified previously by other immuno- 
chemical techniques (Atassi, 1984) were not revealed by the method of Geysen. These discre- 
pancies demonstrate the operational nature of any definition of antigenicity resulting froin 
the fact that the type of probe as well as the particular immunoassay used greatly influence 
the result. Iii a recent study of the histone HZA (Muller et al.. 1986). it was shown, for 
instance, that the antigenic activity of several synthetic peptides depended on whether they 
were tested as free peptides in solution, conjugated to a carrier, or adsorbed to a plastic 
solid phase. Such variations are probably due to variations in peptide accessibility and 
conformation in the different assays. 

In recent years, there have been many attempts to correlate the position of continuous 
epitopes in proteins with certain features of their primary, secondary or tertiary structures 
(Parker et al., 1985). For instance, plots of hydrophilicity (Hopp..l986) along the.peptide 
chain are often used to identify linear stretches of residues that are exposed to the solvent 
and therefore likely to be antigenic. 

The surface location of many chain termini in proteins (Thornton & Sibanda. 1983) probably 
explains why the surface terminal residues are so often implicated in protein epitopes. However, 
it is also possible that the antigenicity of chain tertnini could be due to their high relative 
mobility compared to other more constrained sections of the polypeptide chain (Westhof et al.. 
1984) 

It has been shown that segments of highest local mobility (with an .amplitude of only a few 
A) correlate with highly accessible regions at the surface of the protein (e.g. loops or reverse 
turns) -as well as with the positions of continuous epitopes of a length of 5-10 residues 
(Westhof et al.. 1984 ; Tainer et al., 19B5). In a recent study, a number of continuous epitopes 
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of myohemerythrln were tdenttf led by measuring the ability of synthetic hexapeptldes to bind to 
myoheinerifthrin antibodies (Geysen et al., 1987). The study Included all 113 possible hexa- 
peptides encompassing the llfl residues of the protein. The relative degree of antigenicity 
of the peptides was assessed by the number of rabbits Ininunlzed with the protein that responded 
to each peptide as well as by mean antiserum titre. The f-ive peaks in the mobility curve were 
found to correspond to regions of higher than average antigenicity, . It seems that antibodies 
preferentially tend to recognize short peptides when these correspond to mobile segments of a 
native protein. It should be emphasized that the magnitude of the motions found in segmental 
mobility is small (1-2 A) and that contrary to some claims (Novotny & Haber. 1986), the ener-. 
getic cost for binding is therefore not necessarily prohibitive- Movements of a few A within 
the epitope may have a beneficial effect on the critical positioning of residues and this could 
result in an induced fit and increased binding affinity (Edmundson & Ely, 1986). Such a dynamic 
view of Immunological complementarity is in line with the wide- spread recognition '-that the 
functional activity of proteins is often linked to dynamic conformational changes. However, 
some protein chemists and crystal! ographers reject such a dynamic Interpretation and argue 
that the location of epitopes simply correlates with the taost exposed regions of the protein 
surface (Fanning et al,. 1986 ; Novotny & Haber, 1986). Proponents of this viewpoint believe 
that static surface accessibility is sufficient to explain the location of epitopes in proteins 
and they consider accessibility and mobility as two mutually exclusive explanatory categories. 
There is» however » good evidence that static accessibility is not always a necessary and 
sufficient condition for antigenicity (Wilson et al.p 1984 ; Van Regenmortel et al., 19S6 ; 
Rodda et aK, 1986). 

Clearly, parameters such as surface accessibility, chain termination, mobility and hydro - 
phlllclty are not indepenoent variables but are interconnected. Attempts to single out one of 
these properties as a primary explanation for ant1gen1cii;y and therefore to Ignore othet; corre- 
lations may be counterproductive, since it is likely to limit our capacity to predict which 
regions of a protein may correspond to continuous epi topes - 

The molecular dissection of protein antigens Is not only of Interest because it Increases 
our understanding of immunological specificity, but also because knowledge of the antigenic 
structure of proteins makes It possible to manipulate the Inmune system and gives rise to many 
useful applications in molecular biology, microbiology and biotechnology. The capacity of 
synthetic peptides to elicit antibodies that cross -react. with the corresponding complete 
protein Is being used to develop a new generation of synthetic vaccines (Stewart & Howard, 
1987) and has already produced a. rich harvest of new reagents for Isolating and characterizing 
gene products (Lerner, 193<i ; Walter, 1986). 
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COMPUTER PREDICT!Or4 OF PROTEIN SURFACE 
FEATURES AND ANTIGENIC DETERMINANTS 



Thomas Hopp, Ph.D. 
Immunex Corporation 
51 University Street, Seattle, WA 98101 

For many years it has been appreciated that the 
arrangement of amino acids in the linear sequence of a 
protein is responsible for the three dimensional structure 
of the folded protein. However, until recently very little 
practical Information could be obtained from amino acid 
sequences, because of our imperfect understanding of the 
way that the individual amino acids influence the 
conformation of a peptide chain. At present there are 
several useful ways of predicting conformation from 
sequence. Including methods based on energy 
minimization, frequency occurrence of amino acids in 
particular secondary structures, and consideration of the 
solubility of the amino acids in aqueous and organic 
solvents. These methods seldom generate information that 
is useful on a practical level, partly because they attempt 
to predict too much detailed information from a given 
sequence. In the development of my method, I asked a 
simpler question, namely, is it possible to predict the 
locations of antibody binding sites on a protein sequence', 
regardless of any consideration of the precise 
conformation of the peptide chain? Using this criterion 
and a data set of twelve well known protein antigens, I 
developed a simple hydrophilicity analysis that reliably 
predicts the locations of antigenic residues in protein 
sequences. A listing of experiments where the outcome 
was predictable by my method is presented in Table I. 

TABLE I 

Protein Surface Features Predicted by Hydrophilicity Analysis 
7C Antigenic L>eterminants 
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2. 
3. 



5- 

6. 

7. 
8. 



Influenza hemagglutinin ; Site #1 synthetic peptWe 
immunogen Is protective in mica. Muller et al., PNA5 

T^flJenzi^S Sites #1, 2. and 3 contain 

antigen ically important ^mino acids. Wiley et ai . , 
Nature 289, 373 (1981). 

Inf luenz—hemaqqiutinin : Sites #1 and 3 synthetic 
peptide imm unogens are protective in mice. Shapira 
etal., PNAS 81 . 2461 (1984). . - > 

TRflu e^za hemFggiutinin : Site #1 (X31 strain) is 
contained in a synthetic peptide immunogen recognized 
by T-cells. Lamb et al,. Nature 300, 66 (1982), 
^ . — g.^^ fP^synthetic peptide 

Oeachey et a[- , PNAS 81^, 



Streptococcal M proteTn: 



immunogen is protective 
2203 (1984). 

Poliovirus VPl: Sites #1 



10. 
11 . 
12. 
13, 
14. 

15. 



^ .3, and 5 synthetic peptide 

immunogens stimulate neutralizing sera. Emini et a|. , 
Nature 304, 699 (1983). . 
Poliovirus* VPl: Site #3 is a neutralizing epitope, 
kvans et aU, "Nature 304, 459 (1983). 
Ponov irus-^Pl : Site IFT synthetic peptide reacts with 
neutralizing antibodies. WychowskI et al,, EMBO J. 
2, 2019 (1983). ^. ^, . - . 

Foot and mou th disease virus VPl ; Sites #1 and 3 of 

strain — synthetic peptiae immunogens /a'se 

neutralizing antisera. Bittle et a[. , Nature 298, 30 
r 19821 

Foot and mouth disease virus VPl : Site #3 synthetic 
peptide Immunogen raises neutralizing antisera. KtaTT 
et al., EMBO J. 1, 369 (1982). 

TTepStitis B surface antig en; Site #1 reacts with 
antisera raisea against HBsAg. 



Hopp and Woods, 



antigen ; Site #1 synthetic 
TrTmice. Prince et al.. 



synthetic 
PNAS 79, 



PNAS 78, 3824 (1981). 
Hepatitis B surface 
peptides are immunogenic 
PNAS 79, 579 (1982). 

Hepatitis B surface antigen : Site #i 
peptide IS immunogenic. Bhatnagar et a|. 
4400 (1982). . _ 

Hepatitis B sur face antigen : Sites #2, 4, and 5 are 

contained in synthetic peptides that stimulate 

precipitating sera to HBsAg. Lerner et a^. , PNAS 
78, 3403 (1981). - ^ - 

-Repatitls B surfa ce antigen : Site #3 contained in 

synthetic peptide tR5t stimulates a,'\^»-^ubtype 

antibodies. Dreesman et al^, , Nature 295, 158 (1982J. 



15. Hepatitis B sl 
peptide immunog 
al. , PNAS 80, 2; 

17. Tnfluenza neura 
contain antigen 
immediately adjai 
303, 41 (1983). 

IS. l^weed allergi 
important allerge 
Allergy and Clin 

19. Herpes virus gp 
portion, synth 
antisera. Cohen 

20. HistocompatibiUty 
domain Is ffie 
antigenic specif i< 
Schuize, et al. , 

21 . HistocompatiSFiity 
domain ts an all 
al. , Biochem 22, 

22. TTistocompatibfTRy 
synthetic immuno 
et al. , PNAS 80, 

23. HI stocompat ibfTIty 
synthetic immuno 
et a^. , PNAS 80, 

24. Heta 2 microgU 
monoclonal antTK 
6179 (1983). 

25. Myelin basic pro 
determinant. Rai 

26. Scorpion toxin I 
Cranler et al[. , I 
(1984). 

27. Immunoglobulin 
constant domain 
and Kehoe, Immu 

28. Interf ron alpha 
antibody to inter 
2539 (1983), 

29. tnterleukin 2 : 
antibody to inter 
2176 (1984). 

30. Myoglobin : Sib 
production of ma 
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Hepatitis B surface antigen : Site #3 synthetic 
peptide immunogen is partially protective. Germ et 
al., PNAS 80, 2365 (1983). 

Influenza •Hiuraminldase : Sites #1 through H all 

contain antigen icaiiy important residues °r are 

immediately adjacent to them. Colman et a[. . Nature 
303, ni (1983). 

•R5?iwee d allerxaen protein RA5 ; Site #1 is an 
important allergenic determinant" Roebber et al^. . J. 
Allergy and Clin. Immunol. 71, 162 (1983). 
Herpes vi rus gpD : Site ifT of the extracytoplastnic 
portion, synthetic peptide raises ^ "eutrallzing 
antisera. Cohen et al.. J. ViroJogy Jtg, 102 (1984).. 
Hi^tnrnmp atibllity-antTqen H2 K° : Site #1 m second 
domain is th e^"gene conversion" site^^i^sing the 
antigenic specificity change in the H2 K. mutant. 
Schulze. et al.. PNAS 80. 2007 (1983). 
Histocomp TtiSTlity antigen HLA B7 : Site #2 in second 
domain is an alloantlgenic site. Lopez de Castro et 
al., Biochem 22. 3961 (1983). 

TTistoc ompatibnrty antigen PR alpha chain ; Site #2 
"synthetic immunogen used to make nyonaoma. Niman 
et al.. PNAS 30. 49i»9 (1983). 

Histbco mpatibiTtty antigen PR beta chain ; Site #2 
synthetic immunogen used to make tiyoriaoma. Niman 
et al., PNAS 80, 4949 (1983). 

B"etT 2 micrSglobulin ; Site #2 recognized by a 
monoclonal antibody . Parham et al., J. B.C. 258, 
6179 (1983). ^ 
Myelin basic protein: Site #1 is an encephalitqgenlc 
determlnarit. Hashim Immunol. Rev. 39. 60 (1978). 
Scorpion toxin 11 : Sites #1 and ^ are antigenic. 
Cranier et al.. Int. J. Peptide Protein Res. 
(1 98'i) . 

immunoQlobun n gamma chain : Site #1 of third, 
cbnstant domain . is Cim iaj ai lotypic marker Kehoe 
and Kehoe, Immunochemistry of Proteins 3, 87 
Interfero n alpha 1 ; Site #1 synthetic P^P^icle raises 
antibody to interteron. Arnheiter et a[. , PNAS 80, 
2539 (1983). 

Inte rleukin 2 : Site #1 synthetic P^Ptide raises 
antibody to Interleukin 2: Altman et a[. , PNAS 8]^, 
2176 (1984). 

Myoglobin: Site #1 synthetic PeP^'^^,,,.^^^^^^ 
production of macrophage Inhibitory factor CMIF] by 
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cultured lymph n de cells. Stavitsky et a[. , 
Immunochem. 12, 959 (1975). 

31. Myoglobin : STTe #1 synthetic peptide used to raise a 
hybridoma antibody. Schmitz et , Molec- Imm, 20, 
719 (1983). 

32. Cytochrome c : Site #1 causes deiayed type 
hypersensitivity and T-cell transformation. Wang and 
Reichlin, Molec. Imm. 16, 805 (1979). 

33. Metaliothionein ; Sites #1 and .3 are autoimmune 
antigenic sites. Winge and Garyey, PNAS 80, 2U72 
(1983). , ^ 

3H. Rous sarcoma virus transforming protein (src) : Site 
f2 synthetic peptide immunogen raises sera that 
neutralize tyrosine kipase activity, cross react with 
yes-transforming protein (where it is Site #1) and 
precipitate possible cellular analogs. Gentry et al^. , 
J. B.C. 258, 11219 (1983). 

35. Polyoma "vTrus middle T antigen transforming protein : 
Site #1 synthetic peptide immunogen raises sera that 
react with middle T as well as a normal cellular 
protein. Ito et al . . J. Virology £8, 709 (1983). 

B. Interaction Sites 

36. [m munogiobulin gamma chain ; Site #1 of second 
constant domain is the Clq binding site. Prystowsky 
et al., Biochem. 20, 6349 (1981). 

37. "Cafmodulin: Sites"#3 and M are calcium binding sites. 
Waterson e t al., J. B.C. 255, 962 (1980); Sasagawa et 
al., Bioch'5rnr22, 2565 (l'5ff2). 

38- Influenza hemagglutinin ; Site #1 of X31 strain is the 
proteolytic processing site for cell fusion activity. 
Hopp and Woods, Molec. Immunol. 20, 483 (1983). 

39- Fibronectin : Site #3 synthetic peptide has the cell 
binding activity of the whole molecule. Pierschbacher 
and Ruoslahti, Nature 309, 30 (1984), 

40. Hepatitis B surface antigen : Site #1 contains the 
asparagine residue that is preferentially glycosylated 
over other asparagines. Peterson, J.D.C. 256, 6975 
(1981). b 

41 . Histocompatibility antigens HLA B7 and H2 K : Site 
Tl iri fRe cytoplasmic domain contains the 
phosphorylatable threonine or serine residue. Pober 
et al., PNAS 75, 6002 (1978); Bregegere, et a[. , 
TTature 292, 78 TT981). 

42. Polyoma'^rus middle T antigen transforming protein : 
Site #1 is immediately adjacent to tyrosine 315 which 
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is phosphorylated. , Hunter et a^. , EMBO J. 3, 73 
(198U). 

Mis cellaneous ^ . . *u 

F ava bean Tectin ; Site #1 of the beta chain is the 
location oi the annealing site for circular permutation 
of the genes for favin and Con A. Cunningham et 



c. 

43. 



al^., PNAS 76, 3218 (1979), 

It is clear from the number of entries in this list and 
the variety of the objectives achieved, that my prediction 
method is highly successful and has broad applicability to 
problems in immunology as well as the field of general 
protein chemistry. While no attempt was made to present 
an exhaustive list, the examples cited here demonstrate the 
potential for using hydrophilicity analysis in many areas. 
Including the elucidation of the antigenic structures of 
pathological organisms. The examples in Table I include 
all of the well characterized major disease organisms 
presently under Investigation. The predictable antigenic 
sites have been found to possess the full range of known 
immunological phenomena, including antibody production 
(precipitins, neutralizing and protective sera), 
delayed-type hypersensitivity, allergic responses, 
autoimmunity, encephalitic responses, T-cell proliferative 
responses, graft rejection, and lymphokine production. In 
addition, many unexpected examples of other protein 
surface sites have been listed. These include 
protein-protein interaction sites such as the complement 
binding region of immunoglobulin, the protein-cell surface 
interaction site of fibronectin and the protein-metal 
interaction sites of calmodulin- Other interaction sites 
comprise locations of post-translational modification of 
peptide chains, including proteolytic processing sites ^ and 
sites of phosphorylation and carbohydrate attachment. 

When the wealth of information listed above is, 
considered, it is clear that hydrophilicity analysis should 
be extremely useful in the analysis of molecular phenomena 
related to carcinogenesis. In particular, there has become 
available a huge body of sequence information on the 
proteins involved in transformation, both in spontaneously 
occurring tumors and in virally induced tumors. These 
sequences, as well as the genomic sequences of a variety 
of oncogenic viruses have been obtained for the most part 
by nucleic acid sequencing, and therefore little or nothing 
is known about the str uctures of the proteins produced 
from the sequenc s. In this regard it Is instructive to 
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consider several examples of experiments suggested by 
hydrophllicity analysis of tumor virus genome encoded 

proteins^ first example is investigation of the 

hvdrophilicity properties of the two protein products of 
the env region of the recently described adult thymic 
leukelnTa virus (ATLV) genome. In figure 1, the 
hvdrophilicity profile for the heavy chain of the env 
translation product is compared to two other viral envelope 
oroteins. those of the influenza virus (HAD and the 
hepatitis B virus (HBsAg). This comparison seems 
esDecially interesting because it has been noted that the 
heavy and light chains of the retroviral env translation 
oroducts bear a general resemblance to the two chains of 
the Influenza hemagglutinin (HAl and HA2) . A comparison 
of the ATLV env light chain to HA2 and to a membrane 
qtvcoprotein proauct of the early region of adenovirus 
(ADVE16) is shown in figure 2. The most striking 
similarity among these plots is between the HAl and envl 
heavy chains. A number of common features lead me to 
conclude that these two proteins share closely similar three 
dimensional structures, even though they have not been 
reported to be homologous. 

The HAl and envt chains are obviously similar in 
length, although envl is slightly shorter. More 
significantly, the hydrophilicity profiles show a great 
number of common characteristics. Each has a broad 
hydrophobic valley near the N-terminus. This region of 
HAl makes up the central strand of the membrane 
associated globular domain. At their C-termini, the two 
proteins again show a similar feature in the large terminal 
peak. This feature is associated with the known 
proteolytic processing site of the influenza hemagglutinin 
and the proposed processing site of the env product. The 
profile for HBsAg is Included In figure 1 to show that not 
all envelope proteins share such hydrophilicity profiles. 
While HBsAg does have an N-terminal hydrophobic valley, 
it clearly lacks a C-terminal peak. It is also obviously 
different in being much shorter than HAT or envl. and in 
having a broad central hydrophobic valley that may 
actually be a membrane spanning segment. Although HAl 
and envl both have a number of hydrophobic valleys in 
their central regions, neither has one of sufficient length 
to span a membrane. This central region of the HAl chain 
is known to comprise the globular domain at the distal end 
of the hemagglutinin spike and to contain the binding site 
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Fiaure 1. Hydrophilicity analysis of viral surface 
antigens. The bar above the profile for HAl indicates the 
extent of the globular cell-binding domain. Below the 
profill. the solid^ bars represent 3-strands and the hatched 
bars represent helic s. 



Apr-11-01 04:09pni Froni-lnfotrieve, Inc. 



734 459 8990 



374 / Hopp 



f I I r I I — > — r 

HA2 hong kong 






too 

SEOUEHCE POSITION 



Figure 2. Hydrophilicity analysis of surface antigen light 
chains. In the HA2 plot, solid bars represent e-strands 
and hatched bars represent helices; F, membrane fusion 
region; TM^ transmembrane anchoring segment. 
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for cell-surface sialic acia. The core of this cell binding 
domain is formed into an eight stranded 3 pleated sheet 
With several associated short helical stretches. These 
repeating secondary istructures arc reflected in the 
hydrophilicity profile fc>y the series of larqe peaks and 
vallevs in the centrai part of the plot. Most of the valleys 
are related to the different 3 strands, while the peaks 
represent the highly exposed chain turns at the ends of 
the strands. In this light, it is most interesting to note 
that the envl hydrophilicity plot also shows a sieries of 
large peaks and valleys in its central region, suggesting 
that it, too contains a globular domain composed of 
repeating 0-strands. It is not possible to align all of the 
poaks and valleys of HAT and envl , so there must be 
significant differences between the two proteins as well. 
It is not possible, from these findings, to make a case in 
tavor of any homology or evolutionary relationship between 
these two proteins, hovyevor, given the number of general 
similarities, it would not be surprising if they had indeed 
evolved from a common ancestor, 
these two proteins do share 
dimensional structures, then 
determinants for the neutralization 
located on the hydrophiiic peaks 
central portion of the icnvl protein 
acids 90-95, 142-^7, and 230-235, 



Most significantly, if 
similar overall three 
important antigenic 
of ATI- virus must be 
found throughout the 
^ In particular, amino 
comprising the second , 
tinrd, and fourth highest peaks for envl should be 
explored for usefulness as synthetic peptide vaccines 
against this disease. Past experience indicates that the 
homologous portions of the env products of the related 
AIDS associated viruses should also be considered as 
vaccines against that disease. 

In figure 2, several similarities are apparent between 
env2 and HA2, Each protein has an N-terminal 
hydrophobic valley, which is associated with the membrane 
fusion activity in HA2, Doth proteins have long- 
hydrophobic stretches hear their C-termini, likely to be 
membrane anchoring regions. It is also clear that there 
may be substantial differences between the two, because 
MA2 is significantly longer, and its profile shows more of 
the short-period spikiness associated with the large helix 
content of the molecule. The shorter env2 protein may 
have less of this helix, although it does contain some of 
the short-period spikes in addition to the pronounced 
peaks and valleys that may imply a greater content of 
3-strands. Interestingly, a more convincing structural 
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Similarity Is suggested; by comparing the plot for env2 to 
that for adenovirus El 6. Although the El 5 membrane 
protein has not been implicated as a viral structural 
protein, it is Intriguing to note the apparent similarity of 
the hydrophilicity plots for membrane associated proteins 
of two very different types of oncogenic viruses. 

Examples of hydrophi licity analysis of another 
important family of cancer related proteins are shown m 
figure 3. The two retroviral oncogene products, spc and 
erbB each contain a region of homology to each other , to 
pFoIein kinase, and to the cytoplasmic portion of epidermal 
growth factor receptor. Antibodies raised against a 
synthetic peptide comprising the second highest prediction 
peak for src (residues ^198 to 512) have been shown to 
neutralize Its tyrosine kinase activity, and to precipitate 
the homolgous yes transforming protein as well as the 
normal cellular imilog of src. It would be interesting to 
raise antibodies specific 1o the highest peak for src 
(residues 155 to 160) because this region of the moleculeis 
not homologous to the erbB product or ECF receptor. 
Such an antiserum should be useful in characterizing the 
function of this region of the src molecule and because it 
should cross react strongly witFTThe recently characterized 
cellular c-src gene product, which is identical in this 
region, but may be non-cross reactive with yes, which has 
many amino acid substitutions at this site. It is likely 
that studies of the antigenic peptides predicted for the 
erbS protein would also be quite useful, because predicted 
site #1 has been proposed as a major site of tyrosine 
phosphorylation in erbB , and site #3 is homologous to the 
tyrosine phosphorylation site in src. Antibodies to these 
two sites would clarify the role ofTyroslne phosphorylation 
in erbB, and, because the amino acids around site #1 are 
highly "conserved between erbB and EGF receptor, the 
same antibodies should cross react with the receptor as 
welL Another region of itnerest on erbS is at the 
N-termlnus, where the first 60-70 residues have been 
proposed to reside on the outside of the cell plasma 
membrane. Antibodies specific to this region could be 
raised using synthetic peptides comprising one or more of 
the hydrophilic regions found there. Using these 
antibodies and the ones described above it should be 
possible to begin a molecular dissection of this important 
group of transforming proteins. 
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T CELL CLONES SPECIFIC FOR AN AMPHIPATHIC 
a-HELICAL REGION OF SPERM WHALE MYOGLOBIN 
SHOW DIFFERING FINE SPECIFICITIES FOR 
SYNTHETIC PEPTIDES 
A Multiview/Single Structure Interpretation of Immunodominance 

By KEMP B. CEASE. IRA BERKOWER. JENA YORK-JOLLEY. and 
JAY A. BERZOFSKY 

From Ou Melakolism Branch. National Cancer InstiluU, National Institutes of Health. 
Bethesda. Maryland 20892; and the Bureau of Biologies. Food and Drug Administration, 

Bethesda, Maryland 20892 

Charaaerization of immunodominant T cell sites has been effectively per- 
formed by a number of laboratories (reviewed in 1) using protein sequence 
variants, cleavage fragments/and synthetic peptides. Some studies (2) have been 
interpreted as supportive of the possibility of muldple conformations of peptide 
antigen and/or multiple antigen-binding sites on the la molecule. The possibility 
that distinrt T cell specificities might reflect distina recognition or views of a 
single peptide conformation associated with a sir gle !a site has received little 
attention, primarily because little could be inferred about the conformation of 
the antigenic peptide on the AFC in most experimental systems studied. 

We have previously described an immunodominant 6ie in sperm whale myo- 
globin in a region encompassing gluumic acid 109, identified using myoglobin 
sequence variants (3). and have subsequently isolated T cell clones wth the same 
reacuvity pattern (4). In this paper we characterize two such clones using a panel 
of synthetic peptides. The clones showed different response patterns that are 
found to be totally consistent with a model of the disdntt T cell specificities 
reflecting distina "views" of an amphipathic o-helical conformation. Thus, when 
one considers the likely secondary struaure of antigen existing in association 
with the complex struaure of the la molecule, distinct T cell recognition 
specificiues need not imply distinct structural forms of antigen or sites of anugen 
binding, but rather may reflect distinct views recognized by die T cell receptor. 

Materials and Methods 

Mice. B10.D2 and (B10.D2 x B10.BR)F, mice were obtained from The Jackson 
Laboratory (Bar Harbor. ME). . ... . ^ i, 

T Cell Clones. T cell done 9.27 was derived from Bl 0.D2 mio? as described (4). T cell 
clone 1.2 was derived independenily from (B10.D2 X B10.BR)F, mice as described and 
has been referred to as F,(D2)1.2 in previous studies (5). Both clones are specific for the 
glutamic acid 109 region of sperm whale myoglobin and are restricted to I-A (5). 

Synthetic Peptides. Svnthetic peptides of sperm whale myoglobin 1 02-1 18. )04-U b. 
106-118. 108-118. 109-118. 110-118 were synthesized by manual solid-phase peptide 
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ANTIGEN CONCENTRATION ((iM) 

Firi'BE 1 Proliferative responses of T cell clones 1.2 and 9.27 to synthetic myoglobin 
5SSJorr2S,Sg'or^^ue, 102-1 18(Lys.Tyr-Leu^lu-l^lle^r^lu-Ab.U^le.H«- 
rriiu-Hi^Arg)Lui portions thereof presented by H-2- APCs. Assays were P^or««^ 
as described in the texL Background-subtracted geomeinc means arc shown « 
confidence intervals. (A) clone 1.2. (B) clone 9.27. The background thytnidme mcorporauon 
without antigen was 2.298 cpm for clone 1 .2 and 3.493 cpm for done 9.27. 

svnihesis using a modification of the method of Corley et al. (6) and purified to homoge- 
Sty by gel filtration on BioGel P4 followed by reversed-phase HPLC. Concenirauon 
determination and composition confirmation were determined by ammo acid analysis 
kindly performed by Robert Boykins (Food and Drug Admmistraaon). 

T Cell Proliferation Assay. Assays were performed as described previously (4. &). 



Results and Discussion 
Previous studies (4. 5) using sequence variants of native myoglobin had shown 
that the sites in native sperm whale myoglobin seen by these clones included 
glutamic acid 109 and possibly histidine 1 16. Available sequence vananu do not 
enable higher resolution analysis by this approach. Thus, wc synthesized the 
peptides in the nested series from 102-1 18 to 1 10-118. Fig. 1. A and B show 
peptide dose-response curves for clones 1.2 and 9.27, Both clones respond well 
to 102-118 and 106-118 (Fig. I and Table I). However, clone 1.2 responds 
much better to 104-1 1 8 than to 108-1 18, whereas the reverse is true for clone 
9 27 The rank order of potency for clone 9.27 is not a simple function of 
peptide length, as 104-1 1 8 is less potent than the shorter pepddes 106-1 18 and 
108-1 18 even though it contains ail of the sequence present in these latter 
peptides. The decrease in activity from peptide 106-118 to peptide 104-118 is 
then reversed when the peptide is furdier lengthened to 102-1 18. This result 
further indicates that activity is not simply a function of peptide length. Though 
not shown in this experiment, the potency of peptide 102-1 18 for stimulating 
done 9.27 was consisiendy greater than or equal to that of peptide lOb-118. 
Ncidier clone responded to 1 10-1 18. In subsequent experiments 109-118 was 
found to be inactive for both clones (Table I). 
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Table I 

Proliferative Response of Clones 9.27 and L2 to Peptides 
209*1 J 8 and 102-118 



Oonc 


Peptide 


Geomciric mean [*H)thyroidme incorporation 
(cpni) at an antigen concentration of: 


OmM 


1.0 




9.27 


) 09-1 18 




296 (1,15)* 


257(1.97) 




102-118 




151,060 (1.05) 


119.378(1.09) 




No aniigen 


260 <1.27) 




238(1.35) 


1.2 


109-118 




570 (1.66) 




102-118 




169.457(1.03) 


173.410(1.03) 




No antigen 


402 (1.52) 







Assays Here pcrfarmcd as described in the text. 
* Geometric means arc dK>%cn with the SEM factor in parentheses. 



Thus, the identification of the region around residue 1 09 as the immunodom- 
inant site was confirmed using synthetic peptides, and residues on both sides of 
109 in the sequence appear to contribute to antigenicity. While these data show 
a consensus segment from 106-1 1 8 for stimulation of the clones, they also reveal 
a differential response pattern to longer and shorter peptides. 

This segment folds into a highly amphipathic a helix in native myoglobin with 
the hydrophobic residues on one face and the hydrophilic on the opposite fece 
(Fig. 2 A). Refolding of this peptide into this conformation in the hydrophobic/ 
hydrophilic interface at the surface of the presenting cell should be energetically 
favored. The data we present here on the 102-118 region, along with our 
previously reported data (7a) on the 132-146 site of sperm whale myoglobin, 
led us to hypothesize (I) that the amphipathic helix may be a general feature 
common to many immunodominant T cell sites. Indeed, a sequence consistent 
with formation of an amphipathic helix is seen in the majority of T cell antigenic 
sites described to date. Thus, we suggest that such structures nnay frequently 
represent an integral part of the stimulation complex for the T cell receptor. 

If in fact an a-helical antigen conformation is presented to and recognized by 
the T cell receptor, simple end effects and folding patterns that differentially 
affect one or the other T cell clone recognition regions would appear likely (Fig. 
25). For instance, clone 1.2, which is more sensitive to end effects at 108, that 
is to say, does not respond to peptide 108-1 18, may include this residue in its 
epitope (see Fig. 25). In contrast, the N-terminal limit of the epitope recognized 
by clone 9! 27 may be Glu 109 itself, so that adding on just one more residue at 
position 108, to mask the a-amino group of 109, is sufficient to stimulate the 
clone. Thus, these data are consistent with a model of multiple T cell specificities 
arising from multiple views of a single antigen conformation at a single la-binding 
site and do not require postulation of multiple conformations or binding sites. 

Multiple distinct functional sites on each la molecule have been proposed (2, 
8-13) to explain the results of antibody blocking and la mutant studies, as well 
as to account for findings of distinguishable T cell specificities for a given 
antigenic site. Among these, elegant studies by Allen et al (2. 13) have focused 
on the hen egg lysozyme (HEL) system using molecular variations in antigen and 
la to probe the specificity of a panel of T cell hybridomas. Two T cell hybridomas 
specific for HEL 46-61 in association with I-A*', but differing in fine specificity. 
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FICUM 2 Secondaiy struaui* presentadon for the 102-108 region of spoin whale myo- 
S In th^heltaJ net displiy, the cylinder of the helix is W ^nptudinaOy 
!nd folded Hat (7). (A) The amphipathic chancier of this segment. Hydrophobe: re««lu« are 
ZlS «.d «nd to fan on one fecW the helix while the hydrophihc residues are p«»Uoned 
onthe opposite face. (B) Hypodietical T cell receptor recogniuon envelopes for dones 1.2 
and 9.27 (see text). . 

were found <13. 14) to be differentially sensitive to two mutations in A*. As in 
the case of the bml2 mutant (9. 10). these results could be interpreted equal y 
well in terms of two disjoint sites on la or a single la site recognized differently 
by two T cells that arc differentially sensitive to these substiiuuons. The recent 
finding that these mutations are only three residues apart (14; Allen. P.. and D. 
McKean, personal communication) supports the latter interpretation. Given that 
la. a member of the Ig gene superfamily. shares homology and domain structure 
with Fab. it is not unreasonable to suppose that the several hypervanable regions 
may cluster to form a single combming site for antigen. Indeed, the fact that a 
sinele muution at position 67 resulted in loss of stimulation of T cell hybndomas 
specific for at least three distinct peptides of HEL representing over half of the 
<ases characterized (13) further supports a single site. In the other cases, antigen 
could still be in the same conformation at the same site on la but viewed by the 
T cell sliffhtly differently in association with different adjacent la residues; The 
intriguing difference in rank order of potency of different antigenic peptides for 
two hybridomas (2) can be explained by the presence of a turn at residues 55- 
56. as in native HEL. such that additional residues introduce sumulatory infor- 
mation for one clone but only steric hinderance for the odier. Other studies 
(15-17) have emphasized the importance of distinct T cell specificities for a 
common antigenic site but have not suggested structural inie^retauons. These 
cases are. however, also consistent with a multiview model of T cfll r/^ogn'"^" 
of antigen in a single conformation at a single site. Though all of the data, 
including our own, are also consistent with models proposmg multiple antigen 
conformations and multiple la antigen-binding sites, we would argue that the 
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existing data do not necessitate postulating these more complex models. Thus, 
distinct T cell recognition specificities need not imply distina structural forms 
of antigen or sites of antigen binding but rather may reflect distinct views 
recognized by the T cell receptor. 

The finding of differing T cell clone fine specificities within an immunodom- 
inant site additionally suggests that immunodominance represents the focusing 
of a polyclonal response on a limited region of the antigen and is not a simple 
monoclonal T cell expansion. Other findings are also consistent with this inter- 
pretation (2, 15-17; Livingstone. A., J. Rothbard, and C. G. Fathman, personal 
communication). 

Summary 

The T cell response to sperm whale myoglobin in the haplotype has been 
shown to be largely focused on a limited region around glutamic acid 109 
recognized in association with I-A**. T cell clones 9.27 and 1,2 have been 
previously (4. 5) shown to reflect this specificity and MHC restriction. In this 
study we have used a panel of synthetic peptides from the region 102-118 of 
myoglobin to characterize the specificities of these representative clones. The 
segment from 1 06-1 1 8 was found to represent a consensus region for recognition 
by both cjones. However, we saw significant differences between clones in the 
hierarchy of responsiveness to peptides within the paneL In as much as the 
peptide and the I-A** molecule remain constant, these differences derive from 
differences in how each T cell receptor interacts with the antigen. This peptide 
segment is an amphipathic a helix in native myoglobin, meaning that one side is 
hydrophobic and the other hydrophilic. It is one of the prototype cases that led 
us to find that amphipathic helices constitute the majority of immunodominant 
sites recognized by helper T cells (I). It is likely that the peptide will refold into 
an amphipathic helix stabilized by the interface at the surface of the presenting 
cell. When such secondary conformation is considered, these dau are consistent 
with a model of multiple T cell specificities arising from multiple views of a single 
antigen conformation at a single la-binding site and do not require postulation 
of multiple conformations or binding sites. 

Additionally, the finding of distinct specificities suggests that the immunodom- 
inance of this site depends not on the dominance of a single clone, but on the 
focusing of a polyclonal response on a single region of the molecule in association 
with I-A . The immunodbminaince of this particular region of the protein may 
thus depend on intrinsic features of the site, such as potential to form an 
amphipathic helix, as well as extrinsic factors such as binding properties of the 
1-A molecule. 

We are grateful to Dr. Richard Hodes for critical reading of the manuscript and to Dr. 
Hodes and Dr. Alfred Singer for helpful discussion. 

Recswedfor publicatum 2 June 2986 and in revised form 31 July 1986, 
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The ability to predict T ccU antigenic peptides 
would have imjportant implications for tlic develop- 
ment of artificial vaccines. As a first step towards 
prediction, this report uses a new statistical tcch- 
nioue to discover and evaluate peptide properties 
correlating with T cell antigenicity. This tcchniqiie 
employs Monte Carlo computer experiments and is 
applicable to many problems involving protcm or 

The technique is used to evaluate the contribution 
of various peptide properties to helper T ccU anti- 
genicity. The properties investigated include am- 
phipathidties [a and 0)^ conformational propensities 
(a, iJ. turn and coU). and the correlates of flt-heUces» 
such as the absence of helix-breakers and the posi- 
tioning of the residues which stablize a-heUcal di- 
poles. We also investigate segmental amphipathic- 
ity* (A peptide has this property when it contains at 
least two disjoint subp^tldcs» one hydrophobic, one 
hydrophilic) Statistical correlations and stratifica- 
tions assessed independent contributions to T ceU 

antigenicity. 

The findLogs prescmted here have important un- 
plications for the manufacture of peptide vaccines. 
These implications are as follows: if possible, pep- 
tide vacdnes should probably be those protein seg- 
ments a) which have a propensity to form amphi- 
pathic a-heUces, b) which do not have regions with 
a propensity to coil conformations, and c) which 
have a lysine at their COOH-terminus. The last two 
observations arc of particular use in manufacturing 
peptides vaccines: they indicate where the synthetic 
peptides should be terminated. These implications 
are supported by the findings given below. 

The significances (p values) support the following 
statistical generalites about antigenic conforma- 
tions: I) most helper T cell antigenic sites are am- 
phipathic o-helices: 2) a-helical amphipathicity and 
propensity to an a-helical conformation contribute 
independently to T ceil antigenicity: 3) there is evi- 
dence that some T cell antigenic sites are 0 confor- 
mations instead of a-helices; 4) T ccU antigenic sites 
avoid random coiled conformations; and 5) T cell 
antigenic sites arc usually not segmentally amphi- 
pathic 

a-Helical amphipathicity was significant, but seg- 
mental amphipathicity was not. This has implica- 
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tions for the dimensions of the structure interacting 
with the hydrophobic portion of an amphipathic T 
cell aTi t^g**"^^ site* 

Lysines are unusually frequent at the COOH-tcr- 
minal of T cell antigenic sites, even after accounting 
for tryptic digests. These lysines can stabilize a- 
helical peptides by a favorable interaction with a- 
helical dipoles. This interaction, which occurs with 
other charged residues and not just lysine, is prob- 
ably stronger in peptides than in native protefais 
tye^^a^?*^ of the terminal backbone charges in free 
peptides. This stabilization may explain why alter- 
ation of COOH-terminal lysines often destroys anti- 
genic activity: this experimental fact, never before 
noted as a general observation, is predicted by our 

theory. 

Our statistics are consistent with a "conforma- 
tional hypothesis*: helper T ceil immunodominant 
sites tend to be peptides with strong conformational 
propensities that sUbilize under hydrophobic inter- 
action with some structure on the antigen-present- 
ing ccD, possibly a class II major histocompatibility 
complex protein. The conformational hypothesis is 
an extension of the amphipathicity hypothesis, 
which does not consider conformational propensi- 
ties. Because small peptides do not commonly take 
stable conformations, our results support the quite 
reasonable notion that immunodominant sites are 
often those peptides most able to present the T cells 
with a consistent conformational picture. 

The studies presented here detect several proper- 
tics of the amino acid sequences of antigenic pep- 
tides which correlate with helper T ccU immuno- 
dominance. These properties suggest fimdamcntal 
chemical rules governing T ceU recognition of anti- 
gens. In addition, these properties would be valua- 
ble for incorporation into the rational design of any 
synthetic vaccine. 

PredlcUon of T ceU antigenic peptides would have Im- 
portant ImpUcatlons for the development of artificial vac- 
cines. Such vaccines would be particularly useful in dis- 
eases such as leprosy, caused by organisms which are 
hard to culture and for which the cellular arm of the 
immune system Is the principal defense. Even when 
antibody producUon is the primary goal of vaccination, a 
secondary or anamnestic response requires the inducUon 
of helper T cell immunity. Prediction of peptides for use 
as vaccines requires discovery and conf IrmaUon of prop- 
erties correlating with T ceU antigenicity. The P«in«se f 
this report is to find such properties for the case of helper 
T cells. 
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There are. for our purposes, two antigenic challenges 
which raise a T cell response: a) challenge by native 
protein (N), and b) challenge by a short pepUde segment 
(P) produced either by synthesis or by cleavage from the 
native protein. There are four possibilities for primary, 
followed by secondary, challenge: NN. NP, PN. and PP. 
Although arUficial vaccines are PN (pepUde vaccination 
should immunize against native protein), most laboratory 
experiments are either NN or NP. 

NN and NP exp>eriments can localize immunodominant 
sites. I.e.. minimal peptides within a native protein which 
are the focus of T ceD response. This report will call the 
experimental peptides containing the immunodominant 
sites antigenic sites, f antigenicity" In this paper always 
refers to T cell antigenicity.) If the NN or NP immunodom- 
inant sites were good PN sites, they would be candidate 
peptides for artificial vaccines. Apart from jMire scientific 
interest, this hypothesis motivates determination of im- 
munodominant sites. Immunodominant sites usually are 
found by trials using many pepUdes; systematic predic- 
tion could speed their investigation considerably. 

In vivo, an antigenic protein probably passes through 
the following three mam steps before raising a helper T 
cell response. J) Processing: an antigen-presenting cell 
(APC),' usually a macrophage, dendritic cell, or B cell, 
ingests the protein and then digests it into smaller pep- 
tides (1,2). 2) Presentation: these peptides arc then pre- 
s nted to T cells, probably in conjunction with a class n 
major histocompatibility complex (MHQ protein on the 
APC surface (3-5). 3) Reci^ition: a helper T cell receptor 
then recognizes some combination of peptide and class II 
protein, and initiates a T cell response. 

Two antigenic properties are currently thought to con- 
tribute to this process: amphipathicity and o-heliclty. 

A structure is amphlpathic when it has both a hydro- 
phobic portion and a hydrophlUc portion (6). A peptide is 
segmentally amphlpathic when the peptide contains at 
least two disjoint subpepUdes. one hydrophobic, the other 
hydrophllic. We call a peptide a-amphipathic If, when 
the peptide is put into an a-helical conformation, one side 
of the a-helix is hydrophobic, the other side hydrophllic. 
Both segmental amphipathicity and a-amphipathlcity are 
believed to contribute to T ceil antigenicity, although 
opinions about their relative Importance differ (7-10). 

Present evidence suggests that some antigens assume 
a-hcllcal conf ormations (11.12) which are then stabilized 
by hydrophobic interactions with a class H protein on the 
APC (13-1 5). According to the amphipathicity hypothesis 
(7), these antigens would tend to be a-amphlpathic, with 
the hydrophobic side of the helix interacting with the 
APC, and the hydrophllic side with the T cell. 

Much is known about the a-hellcal conformation. Cer- 
tain amino acids are helix-makers, e.g., glutamate: others 
are helix-breakers. e.g.. proline, glycine, and serine (16). 
Also, because of the orientation of peptide bonds in their 
backbones, a-hellces have an intrinsic dlpole (17-19), 
equi valent to a charge of about e at the NH2-tenninus 
and -Ml e at the COOH-tcrminus (e « elementary charge). 
The dlpole exists even when the A-hellx Is part of a longer 
peptide. Negatively charged residues (Asp/Glu) at the 
NHa-terminus interact favorably with the dlpole, as do 
positively charged residues {Aig/His/Lya) at the COCH- 

' A litmcvta t:-on uoed in thia s^aper: A?C^ antS^en-preaenUng ceil. 



terminus. These interactions can help to stabilize an a- 
helix and. in fact, many a-helices in native protein have 
these residues in the appropriate position (18). a-Helicity, 
if present, can have many implications for the composi- 
tion of antigenic peptides. 

The extended (i.e., fi) peptide conformation is common 
in native proteins and can also be amphlpathic. Unlike 
a-amphipathiclty. ^-amphipathicity is not yet implicated 
in T cell antigenicity. We shall use the word ^-propensity 
to connote a tendency to ^-conformation. Similarly, pep- 
tides with a tender cy to a-helicity have a-propenslty. The 
terms turn propensity" and "coll propensity" are self- 
explanatory. 

Confirmation of the correlation of amphlp»athicities, 
propensities, and other prof>erties with Immunodoml- 
nancc requires a statistical test. Classical statistical 
methods are Inappropriate for protein analysis because 
they require anal} tic description of the parent distribu- 
tion (is the distrib Jtion normal? chi-squared?, etc.). The 
"Materials and Met hods* provides a novel and appropriate 
statistical test for significance in protein (or DNA) data 
bases, made pract cable by Monte Carlo computer exper- 
iments. This test <an confirm the correlation between a 
property and peptide antigenicity. Significant statistics 
win be used in predictive schemes elsewhere (Maigalit et 
al.. manuscript in preparation). 

METHODS 

1. Antigenic data base. Tabic I lists the antigenic sites as this 
paper uses them in statietical tests. The seiectlon criteria for this 
particular list are: the sites a) were reported to be immunodominant 
m the response to a ; roteln. b) were known to the authors prior to 
February 21. 1986. md c) are less than 21 residues long. The 
restrtctlons Involve ax bltrary cut-«ff&. but were necessary a) to cloec 
the statistical data bfise and b) to localize immunodominant sites. 
(Antigenic sites much longer than 2 1 residues probably do not local- 
ize their Immunodom nant site suff icienUy.) The entries in Table I 
are. fm- each experlioent. representaUve of the shortest peptide 
capable of near-maxlinal T cell stlmulaUon. Such pcpUdes are usu- 
ally obvious f nrni the siqpeilmental data: deletion of critical residues 
generally produces a precipitous drop In antigenic activlQr. When 
the eacperlments did lot localize the end residues of an antigenic 
site, the criteria glvAi In Table I were appUed to give a definite 
peptide suitable for statistical testing. In the absence of a regl8ti:y of 
Immunodominant slt<s, these criteria were as objective as possible. 

Thus. Table I does not give the antigenic sites as reported tn the 
literature, but withiri the context of staUatlcal examination* is as 
faithful as poMlhlp to Experimental results. In particular, for reasons 
detailed bekrar, in the absence cxT explicit InfonnaUon. It was safer 
to treat certain peptid » as though they had been produced by trypUc 
or cyanogen bromide cleavage. For technical reasons explained un- 
der "a-ampblpathlclt> * (subsection 3 of Methods), the antigenic sites 
sperm whale myog^c^in 6^78 and influenza hemagglutinin 111- 
1 19 were extended to length 11 when and ^-amphipathicity were 
examined. Table I Is mt s siunmary of the relevant experimenta. but 
is rather a means of i eproducing our statistical results. 

2. Statistical methods.' site statistics. Assume that any site {i.e.. 
any peptide, not na^essarily antigenic) wllhln a protein can be 
associated with a m mber. a site statistic A. The nature of this 
number need not concern us yet: It may reflect «rpropensl^. se- 
quenUal amphlpathic Lty . the absence of prolines within the site, etc. 
Our problem Is to determine the significance of the site staUstlc A 
as a correlate of Immi inodomlnance. 

Overall stattstfcs. The mechanics of this determination arc as 
follows: there arc 12 proteins in the data base and a total of 23 
antigenic sites, a) CeQculate the site statistics Ao for the antigenic 
sites and then b) sum^ these to produce an overall statistic So for the 
antigenic sites. 

Statistical stgn^icance and anti-stgntficance. We now wish to 
assign a significance to So. As an example of (one-tailed) statistical 
aignlflcance. imagine a normal (Gaussian) curve. The significance 
ofa number So is the area to the right c£ under the curve. This Is 
the profaatallity of drawing a random number S exceeding So from 
that normal population. 
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TABLE I 

Antigenic sUes as used in the Monte Carto compute experiment^ 

SUtuB of Residue, near the Peptide COOH-T^^ 
• » cieavMeRc3mctlon.CCOH-tennlnaJ ArgUilneOT-Lyatne 
0 m Cleavage ReatrtcUon. COOH-tcrminal Methhmme 
K « A Lyame Known to be Ncceaaary for Peptide Antigenicity 
Spenn Whale Myoglobin 



6S-78 V*'l~T AtCAlLKK 
102-118 K YLEFISEAIIHVL RS8 
132-146 N KAUELFRKBlAArv 
Pigeon Cytochrome c 

5l-l04 LIAYLK^QATAK* 
Influenza Hemagglutinin Protein A/PRS/48 

1<»-U9 S*S*F 

129-I4(r«^ NOV 

aoa-aly*' c p k 

ptg Pro-lnauUn* 
A4-l-t*^ E Q C C 
B5-16 H L C C 



EBP 
T A A 

y V a 

T S I 
S H L 



I F P K 
S H E G K 
A K L R H 



L Y 
A L 



HenLysozyme 
46-61** N T 
74-86 N L 
82-96»* S A 
108-110*^ W V 
Hen Ovalbumin 
323-339»'' I S 
Hepatitis Pre-S 
120-132 M Q 



G I 
A L 
T A 
K G 



Q I N s a 

S 5 

V N C A & 
D 



QAVHAAHAEINEACE 
WNSTTFHQTLQ 



Foot and Mouth Disease Virus VPl 
14X-160 VPNLRCDLQVLAQKVAttTLP 
Beef Cytochroxnc c 

11- 25 VOKCAQCHTV 
66-80*^ EYLENPRSYI 

HepaUtls B S- Antigen HBsAg/adw 

38-32 SLNFLGGTTV 

95-109 LVLLDYQGML 
140-154 TKPSDGNCTC 
Lambda Repressor 

12- 26 QLEDARRLKA 
Rabies Spike Glycoprotein 

32-44*" DEGCTNLSCF 



E K G G KT 

P O T K if 

C L G Q M 

P V C P L 

1 P I P S 

I Y E I S* 

s y M 



(201 
(21) 

(12) 

(24) 
(25) 
(25) 

(4) 
(27) 

(28) 
(29) 
(29) 
(30) 

(31) 

(32) 

(33) 

(34) 
(10) 

(35) 
(35) 
(35) 

(36) 

(37) 



•Each amino acid la represented by Its single-letter code which is the 
first letter <rf Its name, except for: argininc (R), aspartto acid (D). aapara- 
glne (N). glutamlne (Q), glutamic acid (E), lysine (K). phenylaUnlne (F), 
tryptophan (W), and t^noalnc(Y). 

The primary aourec for each antlgciuc sequence Is indicated to Its right 

The numbering of the antigenic sequence, when available, was taken 
from the primary source, as was the parent protein sequence used In the 
Monte Carlo computer experiments. When otherwise unavailable, the 
protein sequences were taken from the National atomr dl ra l Research 
Foundation protein sequences database (Georgetown University. Wash- 
UigUw*. D.C.). If the termini of an antlgcruc pepUde were not determined 
fay experiment the peptide was taken to be of lengMkll*" 12 and centered 
around known critical residues. Ungqi 11 was used if the number of 
crlUcal residues was odd. 1 2 If tt was even. If the original paper contained 
insuiTlclent InformaUoa to eUmlnate biasing of the COOH-termlnus res- 
idue by either trypUc or cyanogen bromide cleavage, cleavage restriction 
was imposed in the statlaUcal analysis. The Methods section explains 
these Issues more fully^ ^ . * 

* NH,-extenslons of the antigenic site required by amphlpatWcity sto- 
Ustics; these residues are not part of the minimal stimulating peptide. 
PepUdcs containing them retain antigenicity, however, according to the 
primary source. See under -a-amphlpathicity In subsection 3 of Methods 
for further explanation. 

* Lysines necessary to antigenic activity, according to primary source. 
^ Lysines necessary to antigenic activity (23). 

'SubsUtuUons of N-129. S-136. and E-138 established crtUcaUty of 
those residues, and the pepUdc shown overlapped with an antigenic 
•cleavage fragment" in the primary source. As explained in Methods, the 
most conservative course la to represent the antigenic site by the peptide 

shown. ^ , 1 

^Insufficient informaUon to ascertain non-blaslng of CXXJH-termlnal 

residue. , 

* A homologous site In H3 Influenza hemagglutinin protein is recog- 

nUed by human T ceils (26). 

Pro-lnsulIn was used as the parent protein: A- and &<teins were 
thus combined Into a single protein. 

* Tennini not determined by experiment 
•» A8-10 are crtUcal residues (4). 

* Tryptlc cleavage, according to primary source. 
' 113-114 axe crlUcal residues (30). 

Residues 23-25 are thought to be necessary to antigenic acUvity. 
although this Is not yet established (34). 

" Cyanogen bromide cleavage, according to primary source. 



A property Is unusually frequent in the antigenic sites If It has a 
low slirincance (p value). Ukewtsc. It Is unusually Infrequent If It 
has a high p value. We call this anti-signlflcance (this term is not 
staiKlaiti atatisUcal terminology but Is convenient for our purposes). 
AnU-9imtf icance indicates statistical sigalf tcance of an anU-varlate 
(e.g., presence of a residue Instead of Its absence) and. like signifi- 
cance It signals an unusual occurrence. If p Is the vartate signifi- 
cance! then the significance of the anU-varlatc is (1 - p): a p of 0.95 
is as remarkable as a p of 0.05. This is Just one-tailed sUtlstlcal 
significance for the left tail of a diatrtbuUon, rather than lis right 
tall. 

To assign a significance to the antigenic ataUatlc So. we must 
examine what would happen If the antigenic sites were chosen from 
a random populaUon. For random sites, we can follow the procedure 
above- a) compute the site statistics A and then b) add the site 
statistics together to obtain an overall sUtlsUc S. The probability 
that S exceeds So Is the statistical significance of So. Our mala 
difficulty Ilea In defining a *Yandom site.' 

DeJinitU>n oj '"randonL" We give an exact definition of "random* 
below The following metaphor Is useful: Imagine that the protelri 
sequences are Usted In parallel Unes on the ground. Sticks of the 
appropriate lengths underscore the antigenic sites within the se- 
quences. We shall have a computer pick up the sUcks one by one 
and tluw them at random back down onto the sequence from which 
they came. Each sUck underscores a pepUde: these peptides are our 
new -random sites.- Repeating this procedure generates more ran- 
dom sites. For reasons given betow. someUmca we shall insist that 
certain random sites must have a specific amino acid at their COOH- 
terminus. This does not change the mechanics <rf the random selec- 
Uon: It only restricts where the corresponding sack Is allowed to fall. 

Before an experiment is performed, the positions of the antigenic 
sites are unknown. The above procedure Is therefore a reasonably 
faithful reprcscntaUon of possible outcomes. Note that we never 
scramble the protein sequence in any way. This common (and com- 
moiUy fallacious) practice Is inappropriate here, because scrambled 
proteins do not represent possible experimental outcomes. 

We now define "random' precisely: consider, for example, beef 
cytochrome c. It contains two antigenic sites, both of length 15. One 
of these was produced by cyanogen bromide rcacUon and had to end 
m a methionine. We choose two "random* sites of length 15 within 
beef cytochrome: for one of these random sttes. any site of lengqi 
15 Is equally probable: for the other, any site of length 15 ending in 
a methionine la equaUy probable (so the second random site is chosen 
from a much more restricted class of sites). Choose other random 
sites within the other proteins, according to the corresponding 
lengths and number of the antigenic sites within each protein. If an 
antigenic site was produced by a tryptlc digest, then the correspond- 
ing random site should end In arglnine or lysme; likewise, if it was 
due to a cyanogen bromide reactton. then the correspondhig random 
site should end in a methionine. We caU this cleavage restriction. 

Why enforce a cleavage restriction? TrypUc digests (which force 
the terminal residue of an antigenic site to be either artlnlne or 
lysine) systematlcaUy bias the COOH-terminal residue of an anti- 
genic site. Lysines at the COOH-tcrmlnus of antigenic sites turn out 
to be important. Cleavage restriction controls the bias that trypUc 
digests and cyanogen bromide reactions Introduce Into the COOH- 
termlnal residues. , . . ^ 

In theory, all possible cxpeiimcntal biases (even Including the 
experimenter s avoidance of certain amino acids in peptide synthe- 
sis) should be included In the deflniUon of ^random." In practice, 
this is not possible, it is even undesirable if it rcsUrlcts the pool of 
possible random sites too much. For example, in the extreme case, 
let a particular "random" site always be chosen to coincide with the 
corresponding anUgenlc site {l.e.. in the sUck-throwtng analosf . the 
stick underUnlng the antigenic site never moves). The contribution 
Ao that the site Siakes U> the statlsUc S is fixed. This Is equivalent 
to cUmmaUng the antigenic site from the data base. 

The emphasis is therefore more on preventing a statisUcal bias 
ttian on Imitating a physical process. Cleavage tricUon does 
imllate proteolytic cleavage perfecUy. For ocam 
IS unlikely to produce Uie hypothetical site Ala-Leu-Val-Gly-tj/s- 
Lys-Thr-Tyr-Cys-Lys because of the presence of the twointemal 
lysines. Likewise, a trypUc fragment foUows a lysine or ar^incj^ 
the original protein sequence. Similar considerations hold for cyan- 
ogen bromide. The definition of "random* does not encompass tiiesc 
aSd similar facts. In view of the Pr<«dlng paragraph, ho^^^^^^ 
especlaUy because Uiese facts shouW not bias our statistics, they 

^'^n ScU*^''S2' information required to eliminate expertment^ 
biased IS not always avaUablc. In the absence of the '•^^^^^^T 
matlon. a site was always assumed to be subject J>jff ; J^^^.^ 
example of tills is the antigenic site influenza hemagglutinin 129- 
!^ Table I (25). This sUe was localized by examining Uie anUge- 
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nlclty of hcmaggluUnln variants and a "cleavage peptide' (the cleav- 
age method and the precise peptide were unspecified in the refer- 
ence). The most conservative course is to assume that the cleavage 
localizing the antigenic site was trypUc, and then to subject the site 
to cleavage restriction. 

Our site statistics are also systematically Influenced by site length, 
so control of site length is essential. Because the "random" selection 
of potential antigenic sites incorporates controls for site length and 
cleavage methods, these two variables should not bias our statistics. 
In particular our COOH-tcrminal statistics. 

Residue restriction. Unless otherwise stated, cleavage restriction 
Is always used to control the COOH-tcrmlnus of the random sites. 
The one exception, used in special cases only, is residue restriction. 
Here the antigenic sites are classified by their COOH-termlnaJ resi- 
due: Aig. Lys. Met. and other. Random sites are chosen only from 
the same class as the corresponding antigenic site. CXX)H-temiinal 
lysines will turn out to be significant correlates of antigenicity: the 
intent of residue restriction is to remove the effects of COOH-tcrml- 
nation In lysine and measure independent effects from other 
sources. By Including restrictions on arglnine and methionine, rest* 
due restriction continues to prevent bias from cleavage methods. 

A practical example best Illustrates the reason we use residue 
restncUon In certain analyses. Preliminary staUstics showed that 
random sites having lysine at Ihear COOH-termlnus have a higher 
a-amphipathlclty (as quantitated below) than random sites with 
other COOH-termlnal residues. They also showed that CXXDH-tcr- 
mlnai lysines were unusually frequent among the antigenic sites. 
Because lysine is a very hydrophilic amino acid, the COOH-tcrminal 
lysines alone may be causing a high antigenic dr-amphlpathlclty. 
Because residue restrlctian controls the number of sites with COOH- 
termlnal lysines, any significance of a-amphlpathiclty under residue 
restriction argues that a*amphlpathicity contributes to antigenicity 
independently of the COOH-termlnal lysines, in general. If a statistic 
retains Its significance under Residue Restrictioru Us significance 
cannot be due to the unusual frequency of COOH-termlnal lysines 
in Table L 

Monte Carlo computer experiments. We etIU need a way to esti- 
mate the probability that S exceeds So. This can be done by a 
computer employing Monte Carlo computer experiments (38). The 
computer chooses random sites a lar^e numtxr of Umes. Each Ume. 
the 'random** overall statistic S is computed and compared to Sq. The 
proportion of times that S is greater than or equal to So Is the required 
estimate of the statistical significance of So. The more times the 
computer choose random sites, the better this estimate of signlfir. 
cance. Each event (So S S) is one binomial trial, a 1 or a O. and an 
appeal to the binomial distribution (39) shows that SO.OOO computer 
trials given an estimate of significance accurate to about ±0.005. 
Accordingly^ this was the number of trials used. 

The following analogy Justifies the whole procedure. Suppose 
you have a set of loaded dice (dice proteins) which gives sixes more 
frequently than they should (faces » antigenic sites, numbers on the 
faces « measure of a-ampbipathlclty. e.g.). You throw all the dice, 
some perhaps more than once {find some antigenic sites, perhaps 
more than one per protein), and total the resulting numbers (total 
the a-amphlpathtclty scores A for the antigenic sites found). This 
gives a total So (overall statistic) for the loaded dice. 

You now take a set of dice which are not loaded [the computer) 
and throw them several times. If their total S is consistently less 
than the total So from the loaded dice, then the original dice must 
Indeed have been loaded (e.g.. the antigenic sites tend to be more a- 
amphipathic than they should be on the basis of chance alone). The 
total S is statistically more important than the Individual numbers 
A on the dice because any individual die is subject to considerate 
random fluctuaUon. Despite this individual fluctuation, a systematic 
bias will show Itself in the total So on the original dice. 

This method wUl determine whether the antigenic sites are 
'loaded" with respect to certain peptide properties. 

3. Statistics representing the properties: block averaging and 
maximization. The site statistics chosen to represent the properties 
are to some extent arbitrary. To facilitate programming, many of 
these statistics are generated from more elementary block statLstLcs, 
numtiers that are attached to peptides of a fixed length (blocks) 
within the protein. The block statistics must then be converted into 
site statistics. There are at least two reasonable procedures for doing 
this: a) "block averaging* and h] "block maximization" block aver* 
aging means averaging the blcck statistic over all the bkxrks com* 
pletely contained within the antigenic site (similarly, block maslmi- 
zaUon). If an antigenic site contains many "ordinary blocks aloag 
with some immunodnmlnant blocks, averaging dilutes the contri- 
faution that the immunodominant blocks make to the site statistic. 
Hence, block maximization la uaually the procedure of choice. 

The statistics are indexed (A. C.i. etc.) Identically m the following 
and In Table n. For reasons detailed elsewhere (Margallt et al.. 



TABLE n 

Statistical slgntflcances, cleavage Tesiriction 

Slgniricancc p 

StaUstlc 

P |l-P) 



A. a-Amphlpathlcity 0.017 

B. ^-Amphipathlcity 0.855 

C. orHcllcal PropcrUcs 

1. ff-Propcnslty 0.031 
il. Residues (Helix-Makers and -Breakers) 

a. Glutamate Presence . 0.627 

b. Proline Absence 0.098 

c. Glycine Absence 0.048 

d. Serine Absence 0.683 
ill. Moment (HeUcal Dlpole) 

a. Charge 0,095 

b. Lysine Charge 0.042 

c. Htstidine Charge 0.096 

d. Argtnlne Charge 0.71 3 

e. AsparLBte Charge 0.165 

f. Glutamate Cha^c 0.524 
Iv. COOH-termlnal Lysines 

a. 1 -Ultimate Lysine 0.005 

b. 2-Ultimate Lysine 0.010 

D. ^-Pxopcnalty 0.152 

E. Turn Propensity 0.656 

F. CoU Propensity 0.976 (0.024) 

G. Segmental amphlpathiclty 

i. Differential Hydrophoblclty: 0.843 
41. Maximum Differential Hydrophobtcity: 0.887 



manuscript in preparation. arKl Comette et al.. manuscript in pref>- 
aration). this paper uses the Pauchere-PlialEa (40) scale as a measure 
of amino acid hydrophoblclty. 

A. a'Amphlpathicitg. The first property to be examined Is 
o-amphlpatiiiclty. The intensity of the discrete Fourier transform 
provides a site statistic (7). The Fourier transform picks out periodic- 
ities in a sequence of numbers: in this case. It can pick out the 1 00* 
periodicity of hyrophobicities corresponding to an amphipathic a- 
helix, a) Divide the proteins into overlapping blocks of length / The 
first block extends from residue 1 to residue the second block from 
residue 2 to residue / + 1 . etc. (If the protein has length L. then the 
numl>er of blocks is L - / 1 (e.g.. a protein of length / containa 
exactiy 1 block).] / « 1 1 is appropriate, because Fourier transforms 
with smaller / do not always reflect periodicities faithfully (Comette 
et al.. manuscript In preparation). Because two minimal antigenic 
sites, sperm whale myoglobin 69-78 and influenza hemagglutinin 
111-119. in Table L are of lengths less than 11. these sites are 
extended Jot amphipathlcity statistics only to make their lengths 
1 1. The resulting peptides retain near-maximal antigenicity (20. 24). 
The NHa-terminus rather than the COOH-termlnus was extended 
because, as is shown later, COOH-termlnal lysines correlate with 
antigenicity. 

L^ Ji^ be the hydrophobicity of the k^ residue in the protein and 
be the average hydrophoblclty of the k*** block (which consists of 

residues ktok-f/ - 1). The intensity of the discrete Fourier 

transform of the residue hydrophoblclties la: 

I(M) = II S (hj - Kj sln(2xtf|/360))» 



+ I S (hi - Em) cos(2»^366))»)^ 

The Fourier Intensity can again be converted to a site statistic In 
many different ways. The maximal o-intensity Is an appropriate 
choice: b) For each block, take the maximum of the Fourier Intensl* 

ties at ^ « 30*. 85*. 90* 120*. (Unlike the counterpart statistic 

(7). the maximal o-lntensity does not depend on values outside the 
80* to 120* range.) Because the Fourier intensity at 100* corresponds 
to the amphipathlcity of residues In an exact a-helical conf onnatlori. 
the maximization around 100* producing the maximal a-lnlensity 
allows for deviation from exact nrhelidty. This maximization pro- 
vides a block statistic which Is then block-maximized (as described 
at the state of subsection 3) to yield a site statistic. Because maximal 
<r-lntenslty is the only statistic we use to represent a-amphlpathlclty. 
we ahali refer to it as 'a-amphipathlclty* (see A above). 

B. ^-Amphipathlcity. We define the maximal ^-intensity simi- 
larly, but maximize the Fourier Intensities at « - 160*. 165*. 170*. 
175*. and 180*. (By symmetry, the IntenslUea between 180* and 
200* are Irrelevant (Comette et al.. manuscript in preparation).) 160" 
corresponds to an exact ^-sheet. As In *a-Am|^pathiclty" above, a 
block kngth / - U was used and the two shortest antigenic sites 
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art again extended to this block length. 

C^i-HeltcaZ propertlw. The a-h«Ucal conforaiatlon ia weil-lnvea- 
Oizatcd, and as axxch haa many different mcaaura and impUcaUona, 
T?« number of statistics presented reftectthla depth- ^ . 

1 a'Propensity, a] Divide the proteins Into overlapping blcc^ of 
lenlzth / (we take ^ - 9 because this is the length of the shortest 
ani^enlc site), b) Sum the appropriate valuea in Table I of Garnlcr 
ei ai. (41) (which we refer to as G^R Table 1} to calculate O^e 
dlrecUonal a-heUcal Information for the central [S^ residue in the 
. block. This generates a block sUdsUc. In a departure from the usual 
piDcedure. block average to produce a site stat^tic This gives the 
fc^cy of the entlrelite to form an a-helU. (Block maxlmlzaUon 
would reflect the residue most likely to be m ^-helical corUonnaUoii; 
ur isolated, this residue is probably not very important.) Because this 
staUstlc attempts a complete representation of o-propenalty. we 
S^er to It as o-pro^ty. Note that the J^*'*" ^ 

baaed on the statistics of native proteins (41), not short peptides. 
This dlsUncUon wtU turn out to be important. 

It Residue presence and absence. Some residues, notably giu- 
tam^te, are -hellx-makera" whereas otheis, notably proline, glycine, 
and serine, are -hclix-brcakera-. HcUx-makers are ^rct^^^tly found 
in o-hellces, helix-breakers InfrequenUy. The foUowUig stadsUc 
residue presence, testa whether a residue occurs more frequently in 
antigenic sites than at random, a) Assign the residue in 
valLW of I and aU other residues a value of O. b\ Average these 
numbers over each site to produce a site statistic and add the s te 
sUtiatics together In the usual way to produce an ovcraU statistic. 
Presence of the residue in qucsUon IniTOses this statistic. Chanj^ 
the sign of the residue values yields residue absence, which reflects 

the absence of the residue in quesUon. 

ill. ThcmomcnC. Thlslsdeflnedlnconjunctioawlthasctofamuio 
acid values. The values are numbers which are assigned to the amtao 
acids. e.g.. hydrophobicity. charge, etc. Unusual moments reflect 
nonrandom distribution of the values along the length of a site. We 
shall be most interested In charge moments, a) Divide the protein 
up into overlapping blocks of length if We use / - 9. h) Assign to all 
the residues in a block numbers Indicating their signed distance 
from the center of the block. If ^ is odd, the center residue gets a O. 
the carboxy-termlnus residues are labeled 1. 2. 3. ... in s«l"«^c« 
from the center, while the amino-termlnus residues are labeled -i. 

-2 -3 If / 13 even, there Is no center residue, but by analog/ 

with the above, the residues next to the center are labeled and 
-Vb, the ones next to those 3/2 and -3/2. and so forth, c) Multiply 
the numbers by the value of the amino acid occupying the pcsitlon. 
dl Add the resulting products together. This is the moment of the 
values within the block. Maximizing tills btock statistic produces a 
site statistic. ^ 

The moment of charge is large whenever either negative side- 
chains (Asp/Glu) are near the NHj-termlnus or positive aide-chains 
(Arg/Hls/Lys) are near the COOH-terminus. This nonrandom charge 
dlstrltHition Is the one required for favorable interaction with the a- 
heUcal dlpole and would be expected to correlate ^^th a-hcUces. 

We examine the moments corresponding to the following amino 
acid values: a| Charge: Ajg - Lys » 1. His - 0.3 (His is somewhat 
arbitrary). Asp » Glu - -1. all others « 0: b) lysUie charge: Lys « i , 
aU others * 0: and c) aspartate charge: Asp - -1. aU others - 0, 
arglnlne. hisUdlne. and glutamatc charges are defined analogously. 

iv, COOH-terminai lysines. The foUowlng are 1/0 statistics, i.e., 
statistics which take the value 1 If the site has a certain property 
and 0 otiierwlse. The 1-ulUmatc lysine Is defined as foUows: if the 
end-residue on a site U a lysine, then the site statistics is a 1. 
Otherwise the site receives a 0. The 2-ultimate lysine is similarly 
defined: tiie site receives 1 If tiiere Is a lysine In either of the last 
two positions and 0 otherwise. (None of the antigenic sites in Table 
I has an antepenultimate lysine, so we arbitrarily tcnnlnate the 
scries of ultimate lysines at 2.) The overaU statistic S corresponding 
to the 1 -ultimate lysine is tiie sum of the site statistics and is just 
the number of sites having lysine at tiielr CCXDH-termlnus. A similar 
relationship holds for the other ultimate lysines. 

The three statistics represent ^ conformations, turns, and coUs. 
D ^-Propensity. This is exactly analogous to a-propenslty except 
that we use Table 2 of Gamier ct al. (41). Because It ia the only 
attempt to represent ^J-propenslty . we ahaU refer to it as ^-propensity. 

E. Turn propensity. This Is analogous to o-propensity . except that 
we use Table 3 of Gamier et al. (4 1 ). 

F. Coil propensUy. This Is also analogous to a-propenslty. except 
that we use Table 4 of Gamier ct al. (41). 

G. Segmental amphipcUhicUy. We give two site sUtistics Uwt 
can represent segmental amphlpathlclty: many more are possible. 

i. Dijferential hydrophobicity (7). a) Divide the proteins Into 
overlapping blocks of lengtii 2/ (we lake / » 4). b] ?or each block, 
take the sum of the hydrophoblcltics of the residues on either end 
of the block, c) Take the absolute value of the difference of the two 



sums. This yields* block statistic: bkwk maximization as described 
above yields a site statistic. w ^ 

U, Maximal differential hydrophobicity. a) Wvtde Uic proteins 
into overlapping blocks of length / (we take / - 4). b) Kor each block, 
take Oie sum of tiie hydrophobicities of Uie / residues, c) Fw- every 
pair of non-overlapping blocks wltiUn tiie antigenic site. And the 
atooiute value of tiie difference of block sums, d) The site aUtlatiC 
la tiie maximum of tiicse differences. Maximum differential hydro- 
phobicity systematues the procedure that Corradin et aL 110) canted 
out by eye for «*ght antigenic sites. 

4. Corr^kitions. For any pair of site sUtistics X and Y. and for 
any 23 sites (wheUicr random or antigenic}, we calculate r - 
CoviX Y)/(tfx'Y). the correlation coefficient of the 23 (X.Y) pairs, (r - 
1 for perfect correlation [e.g., X - Y). r - -1 for perfect anti- 
correUtlon (e.g., X--Y).andr-OlfXandY ^*™^cpcndent ) r is 
itself an overall statistic, and Us expectation f reflects the coupling 
of X and Y In random sites. Denote Uie r for tiie anUgcntesttes by 
To ro has a gtat'"*'*^! significance which can be estimated by Monte 
Carlo computer experiments. Because ro reflects the coupling of X 
and Y over the actual antigenic sites, a statistically slgaincant ro 
may reflect an [X. Y) pair which Is unusuaUy coupled within the 
antigenic sites. . ' , 

We can illustrate the practical use of correlations by an example. 
Take two site stotistics (Al called X and Y (e.g.. ^it-amphlpaUilclty 
and a-hcUdty). X and Y may be correlated In random ^ (c.g., 
many a-heUces in native proteins are amphipatiilc. X and Y are both 
statisticaUy significant: does tiic significance of X depeiul on its 
correlation witii Y? If so. ro should not be statistically significant: X 
and Y should be no more Hghtiy coupled in ttie anUgente sites than 
Uiey are In random sites. However, if the correlation of X and Y In 
ttie antigenic sites Is statistically significant compared to Uidr cor- 
rciationln random sites, K and Y are more tightiy couplcdMln anti- 
genic sites than they wouW be at random. Hence, one nUglit Infer 
ttiat X and Y contribute indepcndentiy to antigentelty. When X Is «- 
amphlpatiilcity. Uie two shortest antigenic sites in Table I must 
ag^ ^extended to lengtii 1 1 as described In subsection 3 under 
"fl-amphlpathldty*. 

RESULTS AND DISCUSSION 

A. Experimental findings. Tabies U and m were b- 
talncd under cleavage restriction. Because results for 
residue restriction (not shown) were similar, we conclude 
that other significances were Independent of COOH-ter- 
minailyslnes. 

The results in Tables n and 10, detailed in brief below 
and discussed In greater depth in Part C. have Important 
impllcaUons for the manufacture of peptide vaccines. 
These impUcaUons are as follows: If possible, peptides 
vaccines should probably be those protein segments a) 
which have a propensity to form amphipathlc ot-hcUccs. 
b] which do not have regions with a propensity to coil 
confonnaUons. and c] which have a lysine at their CCX)H- 
terminus. The last two observaUons are of particular use 
m manufacturing peptides vaccines: they Indicate where 
the synthetic j^ptldes should be terminated. 

a-Heliccd properties. All of these were strongly repre- 
sented m the antigenic sites, suggesting that many anti- 
genic sites take an a-conformation. Of these properties, 
«-amphlpathlcity was the most significant. The correla- 
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Correiatton statistical significances, cleavage restriction 



SUtisOca X and Y 



Expected 
CorreLaUon 



Antigenic 
ConrelaUOD 



SlgvUiicaaoe 
P 



X. a-Amphlpathidty 1 
Y. *jr-Propcnslty i 


0.:26p 


0.479 


0.136 


X. a-Propcnsity 1 
V. ^-Propensity J 


-0.368 


-0.652 


0354 


X. ^-Propensity ) 
Y. Turn Propenaity f 


0.082 


0.4S2 


0.041 


X. fJ-Propenaity \ 
Y. Coll Propensity [ 


-0.022 


0.445 


0.013 
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tion of o-amphlpathlclty and a-propenslty had a signifi- 
cance of p 0.136, suggesting that the two properties 
may make independent contributions to T cell antigenic- 
ity^ 

COOH-terminal lysines. Lysines were unusually fre- 
quent at the COOH-termini of antigenic sites. This could 
not be an artifact of tryptlc digestion* because cleavage 
restriction controlled potential biases from that source. 
As Table I records, experimental removal or substitution 
of these COOH*terminal lysines often destroys antigenic 
activity. This fact, never before noted as a general obser- 
vation, may be useful in designing the COOH-termini of 
synthetic peptides used for vaccination. In addition, the 
significance of a-propcrties and COOH-terminal lysines 
may suggest something general about the chemistry of T 
cell recognition (see Part C). 

' Conformational propensities. The sites displayed 
some i3-propensity. but no ^-amphipathlcity. Turn pro- 
pensity was not significant, but coil propensity was in 
fact anti-significant, thus colls were notably absent in 
the antigenic sites. ^-Propensity was significantly corre- 
lated with turn and coll propensities and significantly 
and strongly anti-corrclatcd with a-propenslty. These sig- 
nificances perhaps suggest that some antigenic sites take 
^-conformations but have their ^propensity masked by 
th antl-correlatlng o-helices. 

Segmenxal amphipathicity. Segmental amphlpathic- 
ity was not statlsUcally significant In case some sites 
were masking the segmental amphipathicity of others, 
we tested several different subsets of the antigenic sites, 
in particular, those in hen l3rsozyme for which segmental 
amphipathicity was first invoked (7. 13) and the subset 
in Table m of Corradin et al. (10). No subset tested showed 
significant segmental amphipathicity. 

Section C (Discussion qf Experimental JResuIts] offers 
explanations for thieac findings. Before this, however, we 
give a discussion cf our statistical methodology. 

B. Discussion oj the statistLccd approach: equidistri- 
buUon of ignorance. Our statistics are based on the 
"cquidistrlbution of ignorance." Because we have no par- 
ticular reason to favor one of several alternatives, we 
assign all the alternatives an equal probability. In some 
branches of science, in particular statistical physics, 
equldlstrlbuUon of ignorance can be Justified from first 
principles by the so-called cigodic theorems (42). No such 
Justification can be invoked in biok^. In biology, the 
equidistilbutlon is purely the Element of our own ig- 
norance. 

In this report our initial ignorance is tempered only by 
knowledge f ezpertmental conditions, such as the use 
of tryptlc digestion or cyanogen bromide reaction. Our 
def initl n of "random* includes these factors. Our admis- 
sion of ignorance, the definition of "random * provides a 
benchmark against which the Monte Carlo method is 
applied, and the value of a statistic So Is compared against 
the values S we expect. Any statistic S which is statisti- 
cally significant can then be used predictively. This proc- 
ess reflects a change in the state of our ignorance. 

Strat^ation. In theory, we can use our new knowl- 
edige to change the a priori prot>ablIlty of various sites. 
Take o-propensity as an example. Table U indicates that 
that C3cperlment tends to find sites with a larger cr-pro- 
pensity than the random selection produces (p « 0.031). 
Our definition of "random" can be altered to encompass 



this fact We can stratify the random sites into several 
groups according to their a-propensity. The computer 
experiment can be run again, but this time we always 
select a particular random site from the same stratum as 
the corresponding antigenic site. (This stratification is 
analogous to the stratification by cholesterol level that 
would determine whether smoking. Independent of cho- 
lesterol level, was a factor in heart disease.) Residue 
restriction is a particularly simple form of stratification. 
The sites are stratified into four classes on the basis of 
their terminal residues: Arg. Lys. Met, and other. Random 
sites are then chosen from the same stratum as the 
corresponding antigenic site. 

Stratification could, for example, separate the relative 
contributions of a-propensity and a-amphipathlcity to 
antigenicity. We attempted the separation using correla- 
tion significance Instead because stratification requires 
more programming effort. 

Avoiding spurious significance. This paper obeyed 
one very important methodological maxim: we tested only 
those statistics consistent with a physical h3^pothesis. If. 
without a reason to do so. we had tested all 20 amino 
adds for some property, one amino acid would probably 
have been significant at p <0.05 by chance alone. Ap- 
proaches not based on physical theories run a higher risk 
of producing spurious si^ilficances. 

Physical interpretation and. predictive use of statis- 
tics. Despite our emphasis on physical interpretation, the 
G-O-R Tables (41) predict native structure with only 57% 
accuracy (43). It is unlikely that our antigenic propensi- 
ties correlate much better with either native or peptide 
conformations. Mow then should the statistics be Inter- 
preted? 

The significances of various conformational propensi- 
ties depend on the statistics used to measure them. If, for 
example, the G-O-R Tables are applied to a residue with- 
out considering the surrounding residues, there is no 
statlfiftical significance. The second law of thermody- 
namics (as applied to information) states that every Irre- 
versible transformation or simplification of raw data de- 
strcyjns information, e-g., if someone else cannot recover 
yoiir data base after you have manipulated it you have 
destroyed information. Our statistical method, because It 
can utilize raw data, demonstrates this destruction of 
information as a loss of statistical significance. 

By the same token, if a statistic is significant we can 
be confident that every step in its production probably 
preserved some Information, especially if the statistics 
were based on a physical hypothesis. The G-O-R Tables 
can be considered statistical correlates of the free ener- 
gies of amino acid Interactions. Presumably, enough in- 
formation about these free energies was preserved to 
make our statistics significant Our statistics will not 
alwa3rs yield correct predictions, conformational or other- 
wise, however. In fact, the conformation f a peptide 
recognized by a T cell may perhaps sometimes differ f rom 
its conformation in native protein (7, 44). making confor- 
mational verification even more difficult. 

In the terminolc^ of the dice-rolling analogy in the 
Statistical Methods, if some dice are loaded towards 
sixes, we can detect this by rolling the dice a sufficient 
number of times. Despite being loaded, the dice will some- 
times produce ones. In the case of the T cell antigenic 
sites, the sites are loaded with, for example, extra pro- 
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penslty towards a-confonnaUons. In analogy wtth the 
dice, the sutlstlcal methods of this study can only give 
rise to staUstlcal predlcUona, and sometimes the c nfor- 
maUonal predictions from propenslQes will be wrong. 
E>e3pltc this, one ts better off predicting from staUstlcaily 
significant parameters than at random. 

These considerations should not obscure the fact that 
the statistical significances reflect blochemlcaZ propcr- 
Ues of T cell antigenic sites. This paper used p values as 
an instrument to discover biochemical properties of T 
ceU antigenic sites and evaluate the extent to which these 
properties are significant indicators of T cell antigenicity. 
Later reports^ will use the statistics presented here pre- 

dictiveiy. , « ^ 

C. Discussion of experimental results. Tables II and 
III summarize our results. 

Our statistics are consistent with a -^conformational 
hypothesis": helper T cell immunodominant sites tend to 
be peptides with strong conformational propensities 
that can stabilize under hydrophobic Interaction with 
class U MHC proteins. The conformational hypothesis Is 
an extension of the amphipatiaiclty hypothesis (7) which 

does not consider conformational propensities. 

a-Fropertles. A consistent significance for a-properties 
emerged, suggesting that most T ccU antigenic sites take 
an o-hclicai conformation, a-amphlpathiclty and a-pro- 
pensity are both significant (p - 0.017 and p 0.031). 
Moreover, their conelation may also be significant (p « 
0.136). Hence, a-amphlpathlclty may be a significant 
factor In T cell antigenicity independent of its correlation 
with a-propenslty. Antigens stimulating helper T cells 
may bind to the class II protein through hydrophobic 
Interaction (1.13. 15); because recognition occurs at the 
interface between a class II protein at the antigen-pre- 
senting cell surface and an aqueous environment, a- 
helical amphipathicity may help to stabilize antigens In 
a-helical conformation: This forms the basis of tiic So- 
called amphipathicity hypothesis (7). 

Helix-makers and -breaicers. a-Helical conformations, 
whether amphipathic or not, should display the charac- 
teristics mentioned in the Introduction. The helix-breaik- , 
ers proline and glycine should be Infrequent (p = 0.098 
and p = 0,048), The next hellx-brcaker tested, serine, 
was not statistically significant (p 0.683). Similarly, 
the helix-maker glutamate was not present in unusual 
amounts (p = 0.627). In accord with the end of the Dis- 
cussion on Statistical Methods, tests for helix-making 
and -breaking significance ended here. 

The intrinsic dipole. The next physical consequence 
of a-heliclty we examined was the intrinsic dipole and its 
favorable charge distribution, represented by the mo- 
ment of amino acid charge. The moment of amino acid 
charge has a significance of p === 0.095. (Recall that our 
statistics were controlled for tryptic digestion and its 
biasing of charge distribution. Those biases could not 
Influence the results.) When the moments for Individual 
residue charges are examined, lysine and perhaps hlsti- 
dine are significant tp = 0.042 and p - 0.096), whereas 
arglnlne, aspartate, and glutamate are not (p =« 0,713. 
0.165, and 0.524. respectively). 

COOH'terminal lysines. Once attention is drawn to 

=>Mai»aUt, H., J. L. Spontfc J- U Coracttc K. Ceaac C. DeLiai, and 
J. A- Bcraofsky. 1986, Prediction of Immunodominant helper T-ceU 
antigenic aites from tbe prtmaiy sequence. 



lysine. Table I shows its positional preferences quite 
strikingly. Lysine, appearing near the COOH'tcrmlnus of 
antigenic sites far more often than Its f requ ncy In pro- 
teins warrants, is often necessary for antigenic activity. 
The significance of the I - and 2-ultimate lysines In Table 
U is remarkable (p = 0.005 and p = 0.010). 

The ultimate lysines contain a subUe point. Because 
the 1- and 2-ultimate lysines are mor statistlcaUy sig- 
nificant tiian a-amphlpathlclty (p « 0.017) and a-propen- 
sity (p = 0.031). these last two qualities cannot convinc- 
ingly explain the tendency of antigenic sites to terminate 
in lysmes. 

Recall that the directional Information of the G-O-R 
Tables (41). the basis of our -propensity" statistics, are 
based on native peptide conformations (45). The G-O-R 
Tabic 1, if examined, does indeed reflect the stabilizing 
influence of COOH-termlnal lysines on native a-heUces 
(about 6 kcal/mol =» 10 kT, for 10 residue helices in native 
protein ( 1 7)). Despite this, relative to a-propenslty, COOH- 
terminal lysines are unusually significant. This suggests 
that COOH-termlnal lysines sUbillze a-hellces much 
more in free peptides than they do in native proteins. 

In native proteins, a-hellccs have no free backbone 
charges. Free peptides, by contrast, have an extra free 
charge on both their NHa- and COOH-terminl. If a free 
peptide has an a-helical conformation, electrostatic in- 
teractions overwhelmingly favor the conformation plac- 
ing the terminal carboxy-charge away from the backbone 
carbonyl groups near the a-helical axis. This swings the 
side-chain underneath the helical dipole in the free pep- 
tide, rather than to one side as occurs in the native 
protein. (Similar considerations apply to the NHa-tcrmi- 
nai.) The dipole field is stronger along the dipole axis, so 
terminal-ciiarged side-chains interact more str ngly with 
the electrostatic field of the intrinsic dipole in free pejy- 
tidcs than in native proteins. Hence, favorable charge 
distributions are much more decisive in stabiUzing a-hel- 
lces In free peptides than they are in native pr teins. 
This effect may be most apparent with lysine because, of 
all charged residues, lysine has both the most mobUe 
side-chain and the most localized charge. 

For very short peptides which cannot have many a-hel- 
Ical hydrogen bonds to stabilize them, the extra stabUity 
may be very Important For example, residues 1 11-119 
in influenza hemagglutinin are consistent with this hy- 
pothesis, as are residues 69-78 in sperm whale myoglo- 
bin, the latter ending in not Just one, but two lysines. 
The penultimate lysine probably further stabUizes the 
conformation by electrostatic interaction with the peptide 
carboxy-charge. > 

Other explanations for the COOH-tenninal lysines arc 
possible. Although they might result from enzymatic 
cleavage during antigen processing, there is littie evi- 
dence for marked enzymatic specificity in antigen-proc- 
essing cells (46), Another plausible explanation might 
have some or all of these lysines Interacting directiy with 
T cell receptors or class II molecules on APC. Pincus et 
al. (11). for example, implicate ot-helical conformations 
in the antigenic activity of pigeon cytochrome c residues 
94-104. They could not distinguish whether the COOH- 
termlnal Lys-104 was necessary for T cell receptor inter- 
actions r for maintenance of an a-hellx, however. In- 
deed, if the Lys-104 depends on the a-hellcal conforma- 
tion for contact with the T cell receptor, distinguishing 
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these alternatives could be meaningless. Possibly, the 
distinction could be drawn in other cases, and some 
COOH-terminal lysine may be necessary for T ceU inter- 
action alone. 

Other conformatioTml properties. The significance of 
^.propensity (p = 0.152) might indicate that perhaps a 
few antigenic sites are (i.e., extcnded)-structures. This 
hypothesis is bolstered by considering the strong anti- 
correlation of a- and iS'propensitles (f «= -0.369 and ro -= 
-0.646). Although most anUgenic sites take a-confor- 
maUons {71 jS-Propensity sUD retains some significance, 
so some antigenic sites probably do assume /3-conf orraa- 
Uons. There is also an unusual correlation of ^-propensity 
with turn and coll propensities in the antigenic sites (p = 
0.041 and 0.013). The correlation significances are dif- 
ficult to explain without the presence of /3-conf ormaUons. 
Single /5-strands arc unstable, however, so the signifi- 
cances are expUcable if the extended conf ormaUons sta- 
bilize as at least two iS-strands Joined by a coU or turn. 
The absence of /3-amphipathicity (p 0.855) Is similarly 
explained because the coil or turn, being of uncertain 
length, destroys the two-residue periodicity required by a 
significant ^-amphipathicity . Also, ^-structures are often 
twisted, again reducing the required periodicity, ^am- 
phipathicity would help to stabilize a /J-sheet at an 
aqueous interface and may well be present, although 
undetectable by our techniques. 

The anti'Significance of coil propensity ts crucial to 
any theory that immunodomtnajU: sites tend to have 
stable conformations (p = 0.976, Le.. 1-p = 0.0241 If 
pepUde conf ormaUon tends to be more random when the 
peptide has coll propensity, the theoiry would be much 
weaker without this anU-slgnlficance. The significant 
correlaUon of coll and /J-propensitlcs suggests that the 
few coils occurring in the antigenic peptides Join fi- 
strands. 

Our results indicate that T ceils tend to recognize the 
parts of a foreign molecule which have the greatest pro- 
pensities to a-, and perhaps ^, conformations. The pro- 
pensities of T cell antigenic sites are perhaps quite diffi- 
cult for an organism to change because they may oorrcT 
late with structure-function properties of proteins. In this 
respect, tcliologically, the T cell recognition system would 
appear an excellent complement to the antibody system 
which recognizes the surface features of a protein. If the 
structure-function supposition is correct, the T cells must 
of necessity recognize both a- and ^-conformations, be- 
cause otherwise evolution would favor the unrecognized 
structure to evade detection. 

Segmental amphipathlcity. Table n shows that seg- 
mental amphipathiclty Is not correlated with the experi- 
mentally determined antigenic sites (p « 0-843 and 
0.887). Our statistics consistently confirmed the absence 
of segmental amphipathiclty In antigenic sites. Because 
segmental amphipathiclty has enjoyed unwarranted cur- 
rency nthebasisof anecdotal evidence, we discuss it at 
length. Some possible arguments to Justify the lack of 
statistical significance follow. 

The statistics chosen may not reflect the property "seg- 
mental amphipathicity.* This aigument can be raised 
against any statistic purporting to represent a property, 
differential hydrophoblcity is the only previous attempt 
to give this property a quantitative operational definition 
(7). Any operati nal definition can be checked for statis- 



tical significance: maximal differential hydrophobiclty, 
systematizing the procedure of Corradin et al. (10). Is 
another such definition. Neither definition showed statis- 
tical significance. 

If s^mental amphipathiclty is present only in a sul>set 
of the antigenic sites, the remaining sites might mask its 
statistical significance. Those sites in hen lysozyme for 
which the segmental amphip>athicity hypothesis was first 
invoked (7. 13) might then be this subset, or perhaps the 
subset examined by Corradm et al. (10). Several different 
subsets of the antigenic sites, Including the two men- 
tioned, have been tested on their own for segmental 
amphipathiclty; we also tried several different hydropho- 
blcity scales, including the Kyte-E>oolitUe (47) scale that 
Corradin et al. (10) used: segmental amphipathiclty was. 
again, not significant 

We have been most thorough in trying to d^ect seg- 
mental amphipathiclty. Although no statistic can ever 
rule out individual exceptional circumstances, our statis- 
ticai criteria give absolutely no support to the segmen- 
tal amphipathiclty hypothesis. Where appUcable In pro- 
tein and DNA research, the statistical method of tills 
study provides an objective criterion for determining sig- 
nificance. The case of segmental amphipathiclty shows 
that anecdotal evidence can be very misleading. 

The absence of segmental amphipathiclty is also con- 
sistent with the conformational hypothesis. Assume, for 
the sake of argument, that a T ceU antigen is recognized 
while it Is in the amphipathic environment overlying a 
shallow hydrophobic protein cleft. Because a s^mentally 
amphipathic peptide cannot penetrate far into the hypo- 
thetical cleft, any attempt to form r^ular Internal hydro- 
gen bonds leads to exposure of hydrophobic residues. 
This effect produces conformational destabilization. S^- 
mental amphipathiclty therefore runs counter to the con- 
formation hypothesis If antigenic stabUlzation takes 
place in such a hypothetical shallow cleft. 

These studies have brought to Ught a number of prop- 
erties associated with immunodominant antigenic sites . 
for helper T cells. These may be useful In the rational 
design of synthetic vaccines. It should be cautioned, how- 
ever, that m any given Individual, only a subset of such 
sitcis will be seen, depending on MHC-llnked Immune 
response genes, self-tolerance to homologous self-anti- 
gens, and other genetic and environmental constraints 
of the hc^ (3-5, 14). 
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